I have some comments and suggestions on the module design. I think these
need to be taken into account before we can merge the implementation
provided below into the mainline code. I apologize if these should have
been brought up earlier as for some reason or the other I was out of loop
on this one

    https://github.com/apache/incubator-apex-core/pull/148
<https://github.com/apache/incubator-apex-core/pull/148#issuecomment-153104963>

    1. DAG scoping currently in the implementation is global for modules,
each module's populateDAG sees the entire DAG. It should be locally scoped
as one module does not and should not know about another.
    2. The module has a populateDAG method with exact same syntax as in
StreamingApplication. Is StreamingApplication also a module, should it
extend that interface.
    3. Setting properties for modules is too verbose. Module developer
needs to repeat every property they want exposed with a setter and getter
in JAVA. I don't disagree that module developer should be able to choose
which properties from which operators need to be exposed but the current
way seems to duplicate code. Here is a suggestion.
         a. Allow modules to specify which operators and properties can be
accessible from outside. One way is in the "populateDAG" method of the
module when adding the operator have the ability to specify if this
operator can be accessible from outside and which or all properties can be
accessible.
         b. Provide methods in ModuleMeta or elsewhere to set property
values by specifying the operator name (friendly name) inside the module
and property name. If this is allowed by a. above it is successful else it
should fail.
         c. Allow a syntax in property files to specify the property in b.
Example syntax dt.module.<modulename>.operator.<operatorname>.prop.<
propname>
    4. For attributes same mechanism as in 3 should apply for the operators
that are exposed by the module.  For property file, example syntax
dt.module.<modulename>.operator.<operatorname>.attr.<attrname>
    5. Module developers in addition to 3. and 4. above may choose to
support module level properties and attributes. These should not be the
default when 3. and 4. are possible but complementary, in addition to them.
In this case for properties they can implement setters and getters in the
module. For attributes the user should still be able to set the attributes
using the dag setAttribute method. You could introduce a method in the
module to process attributes that can get called by the engine once
everything is set.
    6. For 5. above setting global properties and attributes for module is
akin to ideas that have been proposed for the application as well. A
consistent way must be possible for applications as well even if it is not
implemented now.
    7. For 5. or 6. above there should be a property file way of specifying
the global module properties and attributes. Example syntax
dt.module.<modulename>.prop.<propname>, dt.module.<modulename>.attr.<attrname>.
Notice the difference with 3. c. and 4 above that there is no operator
keyword here.
    8. Partitioning needs to be consistent with what the user will expect
when they see module as an entity. I will send an image of possible
examples of how the user will expect the physical plan to look in certain
cases.

Thanks

Reply via email to