Re: Modules

Vlad Rozov Tue, 03 Nov 2015 10:08:52 -0800

This brings one more issue that we did not cover in the module designdiscussions. What happens when new versions of operators embedded intoa module become available? I believe we all go with the assumption thatmodule will pickup version available on the classpath at run-time and itis operator developer responsibility to provide full binarycompatibility. Another possible behavior is to consider module ascompletely independent unit and package all necessary libraries alongwith the module.


Thank you,


Vlad

On 11/3/15 09:40, Pramod Immaneni wrote:

I am not suggesting we leak the internals or compromise access modifiers. I
want the module developer to have the ability (not mandatory) to make
available all or a subset of the properties of an operator easily if they
desire without having to create setter/getter for each of them. You don't
have to expose the operator they belong to. My preference would also be to
preserve the namespace of the properties in some way for example by
grouping them by operator name. Think about scenario where people have
built modules using kafka input operator and there is a new kafka
connection property. Without having this ability the modules have to be
changed to support this property. With this feature the module developers
have a choice whether to keep the list of kafka properties fixed in the
module or allow new properties.

Thanks

On Tue, Nov 3, 2015 at 9:31 AM, Thomas Weise <tho...@datatorrent.com> wrote:

There is also the option to inherit common module properties through a base
class.

I don't see how this is any different from an operator. The developer
decides what gets exposed and has the same options to control it.

Encapsulation is good practice, by leaking the module internals the using
code becomes brittle.

Thomas

On Tue, Nov 3, 2015 at 9:22 AM, Amol Kekre <a...@datatorrent.com> wrote:

The same goes for a Java or C++ class that changes its api. In general

this

is left to the developer, and these languages have internals as private

by

default for precisely the same purpose. The module developer must have

the

right to change internals, keep api clean/constant and expect user code

to

not break.

Thks,
Amol


On Tue, Nov 3, 2015 at 9:19 AM, Pramod Immaneni <pra...@datatorrent.com>
wrote:

For 3 and 4 can't we strike a balance between not having to expose the
operators underneath and at the same time not having to write

boilerplate

code for all the properties that the module wants to make available
outside. It can quickly become unmanageable. For example, an input

operator

has a new connection property which can be used outside and now all the
modules using that operator, their code has to be modified to just add

pass through setter/getter. How about treating the operator name as a

group

name and ability for module developers to easily make available/specify

all

or a subset of the properties of an operator to the user without having

to

explicitly make each of them a module property.

On Mon, Nov 2, 2015 at 5:00 PM, Amol Kekre <a...@datatorrent.com>

wrote:

3,4 should follow conventions where the creator decides the api

(including

accessibility). In general only those properties exposed by module

creator

should be settable. What the module internally does with them is

module

designer's call. Accessing internals of module from outside is

uncommon.

For exampe in Java (or C++) private fields/members are not to be

accessed.

Properties (setter and getter) are the api that module designer gives

to

the module user. It is dangerous and has unintended consequences if

module

user starts to access internals outside the api.

Partitioning should be next phase. As long as current design does not

halt

partitioning it should be ok (which I believe is true).

Thks,
Amol


On Mon, Nov 2, 2015 at 3:44 PM, Vlad Rozov <v.ro...@datatorrent.com>
wrote:

1. +1, though passing original DAG to module's populateDAG() it is

not

by

design and is the current pull request implementation details.

2. While I agree that both Module and StreamingApplication let's
module/application designer to expose DAG design reuse pattern and
StreamingApplication interface may be extending Module, it does not

seem

to

buy us much. Do we want to allow certain applications to be reused

as

Modules in other applications or should application package be

different

from Module package? The current approach is to distribute Modules

as

part

of .jar for example as part of Malhar library without necessarily

providing

all necessary dependencies. Application package on other side must

include

all dependencies not provided by the platform.

3, 4. While this will help Module designer, it may complicate

Module

maintenance and how Modules are used. What if Module designer wants

to

change it's implementation and replace one operator implementation

with

another operator? Does StreamingApplication designer need to know

internal

structure of Modules? Should Module be considered as a black box

during

Application design time as it was initially planned?

5, 6, 7 +1. This is currently proposed behavior of Module

functionality

the way I understand it.

8. We need to see what Module designer can specify for

partitioning.

One

of supported cases should be ability to specify cascading

partitioning

scheme.

Thank you,

Vlad


On 11/2/15 10:30, Pramod Immaneni wrote:

I have some comments and suggestions on the module design. I think

these

need to be taken into account before we can merge the

implementation

provided below into the mainline code. I apologize if these should

have

been brought up earlier as for some reason or the other I was out

of

loop

on this one

      https://github.com/apache/incubator-apex-core/pull/148
<

https://github.com/apache/incubator-apex-core/pull/148#issuecomment-153104963

      1. DAG scoping currently in the implementation is global for

modules,

each module's populateDAG sees the entire DAG. It should be

locally

scoped

as one module does not and should not know about another.
      2. The module has a populateDAG method with exact same syntax

as

in

StreamingApplication. Is StreamingApplication also a module,

should

it

extend that interface.
      3. Setting properties for modules is too verbose. Module

developer

needs to repeat every property they want exposed with a setter and

getter

in JAVA. I don't disagree that module developer should be able to

choose

which properties from which operators need to be exposed but the

current

way seems to duplicate code. Here is a suggestion.
           a. Allow modules to specify which operators and

properties

can

be
accessible from outside. One way is in the "populateDAG" method of

the

module when adding the operator have the ability to specify if

this

operator can be accessible from outside and which or all

properties

can

be

accessible.
           b. Provide methods in ModuleMeta or elsewhere to set

property

values by specifying the operator name (friendly name) inside the

module

and property name. If this is allowed by a. above it is successful

else

it

should fail.
           c. Allow a syntax in property files to specify the

property

in

b.
Example syntax

dt.module.<modulename>.operator.<operatorname>.prop.<

propname>
      4. For attributes same mechanism as in 3 should apply for the
operators
that are exposed by the module.  For property file, example syntax
dt.module.<modulename>.operator.<operatorname>.attr.<attrname>
      5. Module developers in addition to 3. and 4. above may

choose

to

support module level properties and attributes. These should not

be

the

default when 3. and 4. are possible but complementary, in addition

to

them.
In this case for properties they can implement setters and getters

in

the

module. For attributes the user should still be able to set the

attributes

using the dag setAttribute method. You could introduce a method in

the

module to process attributes that can get called by the engine

once

everything is set.
      6. For 5. above setting global properties and attributes for

module

is
akin to ideas that have been proposed for the application as

well. A

consistent way must be possible for applications as well even if

it

is

not

implemented now.
      7. For 5. or 6. above there should be a property file way of
specifying
the global module properties and attributes. Example syntax
dt.module.<modulename>.prop.<propname>,
dt.module.<modulename>.attr.<attrname>.
Notice the difference with 3. c. and 4 above that there is no

operator

keyword here.
      8. Partitioning needs to be consistent with what the user

will

expect

when they see module as an entity. I will send an image of

possible

examples of how the user will expect the physical plan to look in

certain

cases.

Thanks

Re: Modules

Reply via email to