Let's restart the discussion of this topic.

We'd like to break malhar into modules, so we can have separate artifacts
for kafka, cassandra, hbase, etc., instead of just malhar-contrib and
malhar-library.
This way users using them will only pull in the right dependencies
automatically, without the ugly business of optional and exclude
dependencies today.

Also, I propose adding the 3rd party version in the artifact name.  For
example:

malhar-kafka-0.8
malhar-kafka-0.9

so that we can simultaneously support multiple versions of kafka.

Thoughts?

David

On Fri, Oct 2, 2015 at 4:40 PM, David Yan <[email protected]> wrote:

> The list of all malhar operators are listed as part of the apidoc here:
> https://www.datatorrent.com/docs/apidocs/index.html
> And developers should be able to find the operators they need there.
>
> But, it's referenced from
> https://www.datatorrent.com/product-documentation/ as "Platform API
> Reference" so users may have trouble finding it.
>
> We probably should have a separate javadoc pages for Apex Core and Apex
> Malhar and add the links to this page http://apex.apache.org/docs.html
> also.
>
> David
>
> On Fri, Oct 2, 2015 at 4:28 PM, Pramod Immaneni <[email protected]>
> wrote:
>
>> We got to think about how people can find the operators and
>> dependencies when bundling the applications. The complain I hear often
>> is that folks can't find the operators they are looking for. We should
>> be careful about how much more work this will add for the user to now
>> search and find all the dependencies.
>>
>> Thanks
>>
>> > On Oct 2, 2015, at 3:44 PM, David Yan <[email protected]> wrote:
>> >
>> > I actually don't think it makes sense any more to separate
>> malhar-library
>> > and malhar-contrib after the breakup, especially since we are planning
>> for
>> > a major release for these changes.
>> >
>> > People are often confused, myself included, which operators should be in
>> > malhar-library and which ones should be in contrib.  Requiring a
>> separate
>> > setup for unit test should not be a criteria because the user of the
>> > library couldn't care less whether the unit test requires extra setup.
>> The
>> > factor of requiring extra dependencies isn't valid either because
>> there're
>> > already dependencies of malhar-library now that apex does not have.
>> >
>> > We can retain them for backward compatibility purpose but going forward
>> new
>> > app packages should only use the baby artifacts, without denoting
>> whether
>> > it's contrib or not.
>> >
>> > David
>> >
>> > On Tue, Sep 29, 2015 at 12:19 AM, Andy Perlitch <[email protected]>
>> > wrote:
>> >
>> >> Hi all,
>> >>
>> >> This is a first cut at a plan to restructure malhar in a way that is
>> more
>> >> portable and adherent to Maven's principles of modularity and
>> dependency
>> >> management.
>> >>
>> >> Overview of Current Malhar Architecture
>> >> ---------------------------------------------------------------
>> >> The current malhar repo consists of several maven modules:
>> >>
>> >> * *malhar-library*
>> >>   operators which do not require additional transitive dependencies
>> beyond
>> >> what Apex and Hadoop require
>> >> *  *malhar-contrib*
>> >>   operators requiring other maven dependencies
>> >> * *malhar-demos*
>> >>   demo applications
>> >> * *malhar-samples*
>> >>   sample code showing example usage of malhar operators
>> >> * *malhar-apps*
>> >>   apex applications (currently only logstream)
>> >>
>> >>
>> >> Proposed Changes
>> >> ---------------------------------------------------------------
>> >>
>> >> 1. *Scrub malhar-library for any operators needing additional
>> dependencies*
>> >>  `malhar-library` is intended to consist of only operators without
>> extra
>> >> transitive dependencies. All operators should be checked for the
>> necessity
>> >> of extra dependencies.
>> >>
>> >> 2. *Move operators from malhar-demos and malhar-apps into contrib (or
>> >> library if prudent)*
>> >>    There are various operators in both of these modules that are
>> general
>> >> enough to move into library or contrib.
>> >>
>> >> 3. *Create modules for all contrib subfolders*
>> >>    All folders under `contrib/src/main/com/datatorrent/contrib/`
>> should be
>> >> converted to modules of contrib and listed as such in
>> `/contrib/pom.xml`.
>> >>    Additionally, each of these smaller contrib modules will have its
>> own
>> >> version and dependencies.
>> >>
>> >> 4. *Use the Shades Plugin to allow for backwards-compatible
>> fully-qualified
>> >> class names*
>> >>    This is made possible by shades class relocation
>> >> <
>> >>
>> https://maven.apache.org/plugins/maven-shade-plugin/examples/class-relocation.html
>> >> feature. This might be a bit error prone as well as confusing to use
>> for
>> >> outside developers, but it must be done if these changes are to be made
>> >> prior to a major release.
>> >>
>> >>
>> >>
>> >> Let me know what you all think of this approach.
>> >>
>> >> Best,
>> >> Andy
>> >>
>> >>
>> >> On Tue, Sep 22, 2015 at 11:20 AM, Chetan Narsude <
>> [email protected]>
>> >> wrote:
>> >>
>> >>> +1
>> >>>
>> >>> On Tue, Sep 22, 2015 at 11:08 AM, Gaurav Gupta <
>> [email protected]>
>> >>> wrote:
>> >>>
>> >>>> I agree with David.. Each artifact should have it's own version
>> >>>>
>> >>>> Thanks
>> >>>> -Gaurav
>> >>>>
>> >>>>> On Tue, Sep 22, 2015 at 11:07 AM, David Yan <[email protected]>
>> >>>> wrote:
>> >>>>
>> >>>>> I actually think that each baby artifact should have its own
>> version,
>> >>>>> because each artifact has its own interface and its own life cycle,
>> >>>>> especially after we break up the giant library, applications will
>> >>> depend
>> >>>> on
>> >>>>> the baby artifacts instead of the giant library.  For example if
>> >> there
>> >>> is
>> >>>>> no change in malhar-contrib-kafka (I think the name should actually
>> >> be
>> >>>>> apex-malhar-kafka), we should not confuse users by bumping the
>> >> version.
>> >>>>>
>> >>>>> David
>> >>>>>
>> >>>>> On Tue, Sep 22, 2015 at 9:03 AM, Andy Perlitch <
>> [email protected]
>> >>>
>> >>>>> wrote:
>> >>>>>
>> >>>>>> Tushar,
>> >>>>>>
>> >>>>>> I agree that all modules should inherit the version from the
>> >> "parent
>> >>>> pom"
>> >>>>>> of the malhar repo. I think the benefits outweigh the cost of
>> >> bumping
>> >>>>>> versions of components that haven't actually changed. I'd love to
>> >> get
>> >>>>>> others feedback on this as well.
>> >>>>>>
>> >>>>>> On another note, I plan on starting a spreadsheet/googledoc with
>> >> the
>> >>>>>> possible groupings of operators into these modules. Stay tuned...
>> >>>>>>
>> >>>>>> -Andy
>> >>>>>>
>> >>>>>> On Mon, Sep 21, 2015 at 11:51 PM, Tushar Gosavi <
>> >>>> [email protected]>
>> >>>>>> wrote:
>> >>>>>>
>> >>>>>>> +1 for the general idea
>> >>>>>>>
>> >>>>>>> Does these independent modules going to have independent
>> >> versions?
>> >>>> For
>> >>>>>>> example, if there is no change in kafka operator between malhar
>> >> 3.0
>> >>>> and
>> >>>>>>> malhar 4.0, will we increment version of malhar-contrib-kafka to
>> >>>> 4.0. I
>> >>>>>>> have learned from my previous project that, It is easier to
>> >> manage
>> >>>>>> versions
>> >>>>>>> if we make all modules at same version level for a release, even
>> >> if
>> >>>>> there
>> >>>>>>> is no change in a particular module.
>> >>>>>>>
>> >>>>>>> - Tushar.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On Fri, Sep 18, 2015 at 12:18 AM, Timothy Farkas <
>> >>>> [email protected]>
>> >>>>>>> wrote:
>> >>>>>>>
>> >>>>>>>> I agree Andy's solution is better, but just for the sake of
>> >>>> argument
>> >>>>>>>> profiles can be inherited from a parent pom, so if the maven
>> >>>>> archetype
>> >>>>>>>> defines a new project with a parent pom with the correct
>> >> profiles
>> >>>>>>> defined,
>> >>>>>>>> then the desired profiles can be activated in the pom of the
>> >> new
>> >>>>>> project.
>> >>>>>>>> It is no more complicated than adding additional dependencies
>> >> to
>> >>>> your
>> >>>>>>>> project.
>> >>>>>>>>
>> >>>>>>>> On Thu, Sep 17, 2015 at 10:32 AM, Sandesh Hegde <
>> >>>>>> [email protected]
>> >>>>>>>>
>> >>>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>> Currently all the dependencies in Malhar-Contrib are marked
>> >> as
>> >>>>>>> optional.
>> >>>>>>>> So
>> >>>>>>>>> users have to already modify the existing POM to use it in
>> >>> their
>> >>>>>>> project.
>> >>>>>>>>> So restructuring should be fine.
>> >>>>>>>>>
>> >>>>>>>>> On Thu, Sep 17, 2015 at 11:29 AM Chetan Narsude <
>> >>>>>>> [email protected]>
>> >>>>>>>>> wrote:
>> >>>>>>>>>
>> >>>>>>>>>> The profiles are excellent when you are developing
>> >>>>> malhar-contrib.
>> >>>>>>>>> Profiles
>> >>>>>>>>>> do not work when you are using malhar-contrib. The problem
>> >>> Andy
>> >>>>> is
>> >>>>>>>>> trying
>> >>>>>>>>>> to solve is the later. If there is an elegant solution
>> >> which
>> >>> I
>> >>>> am
>> >>>>>>>> missing
>> >>>>>>>>>> using profiles, please correct me.
>> >>>>>>>>>>
>> >>>>>>>>>> The way Andy suggested is the way many successful projects
>> >> do
>> >>>> it.
>> >>>>>>> Look
>> >>>>>>>> at
>> >>>>>>>>>> Netty as an example.
>> >>>>>>>>>>
>> >>>>>>>>>> +1 for that.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> --
>> >>>>>>>>>> Chetan
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> On Thu, Sep 17, 2015 at 11:22 AM, Timothy Farkas <
>> >>>>>>> [email protected]>
>> >>>>>>>>>> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>>> I think restructuring the project in that way would be
>> >> the
>> >>>>>>>> technically
>> >>>>>>>>>>> correct thing to do, but if people are unwilling to
>> >> accept
>> >>>> the
>> >>>>>>> change
>> >>>>>>>>> in
>> >>>>>>>>>>> project structure you could achieve something similar by
>> >>>> using
>> >>>>>>> maven
>> >>>>>>>>>>> profiles. With profiles the project structure would
>> >> remain
>> >>> as
>> >>>>> is.
>> >>>>>>>>>> Profiles
>> >>>>>>>>>>> could be added to the malhar pom, and a profile would
>> >>> define
>> >>>>> the
>> >>>>>>>>>>> dependencies needed for different types of operators. For
>> >>>>> example
>> >>>>>>> the
>> >>>>>>>>>> hbase
>> >>>>>>>>>>> profile would define the dependencies for the hbase
>> >>> operator.
>> >>>>>> Then
>> >>>>>>>> any
>> >>>>>>>>>>> project using a malhar library would just activate the
>> >>>> correct
>> >>>>>>>> profile
>> >>>>>>>>> in
>> >>>>>>>>>>> it's pom, and the correct dependencies would be pulled
>> >> in.
>> >>
>> http://maven.apache.org/guides/introduction/introduction-to-profiles.html
>> >>>>>>>>>>>
>> >>>>>>>>>>> On Thu, Sep 17, 2015 at 10:01 AM, Andy Perlitch <
>> >>>>>>>> [email protected]>
>> >>>>>>>>>>> wrote:
>> >>>>>>>>>>>
>> >>>>>>>>>>>> Hi everyone,
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> I am currently assigned to MLHR-1843
>> >>>>>>>>>>>> <https://malhar.atlassian.net/browse/MLHR-1843>, which
>> >>>>>>> essentially
>> >>>>>>>>>> aims
>> >>>>>>>>>>> to
>> >>>>>>>>>>>> expose smaller, more consumable maven artifacts that
>> >>> would
>> >>>> do
>> >>>>>>> away
>> >>>>>>>>> with
>> >>>>>>>>>>> the
>> >>>>>>>>>>>> need to manually include necessary dependencies based
>> >> on
>> >>>> the
>> >>>>>>>>> operators
>> >>>>>>>>>> in
>> >>>>>>>>>>>> use.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> As an example, say I am building an app package that
>> >>> needs
>> >>>>>> Kafka
>> >>>>>>>>> input
>> >>>>>>>>>>> and
>> >>>>>>>>>>>> output operators, but I don't want all the other
>> >>> transitive
>> >>>>>>>>>> dependencies
>> >>>>>>>>>>>> that come via malhar-contrib. Currently I would need to
>> >>>>> specify
>> >>>>>>>>>>>> malhar-contrib as a dependency, and add an exclusions
>> >>> block
>> >>>>> in
>> >>>>>>> my
>> >>>>>>>>> app
>> >>>>>>>>>>>> package pom:
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> *<dependency>  <groupId>com.datatorrent</groupId>
>> >>>>>>>>>>>> <artifactId>malhar-contrib</artifactId>
>> >>>>>> <version>3.0.0</version>
>> >>>>>>>>> <!--
>> >>>>>>>>>>> so
>> >>>>>>>>>>>> none of malhar-contrib's deps are included -->*
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> *  <exclusions>    <exclusion>
>> >> <groupId>*</groupId>
>> >>>>>>>>>>>> <artifactId>*</artifactId>    </exclusion>
>> >>>>>>>>> </exclusions></dependency>*
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Then, I would have to include the kafka library
>> >>> explicitly
>> >>>>> as a
>> >>>>>>>>>>> dependency:
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> *<dependency>  <groupId>org.apache.kafka</groupId>
>> >>>>>>>>>>>> <artifactId>kafka_2.10</artifactId>
>> >>>>>>>>>>>> <version>0.8.1.1</version></dependency>*
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Wouldn't it be nice if I could just put this in my
>> >> pom?:
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> *<dependency>  <groupId>com.datatorrent</groupId>
>> >>>>>>>>>>>> <artifactId>malhar-contrib-kafka</artifactId>
>> >>>>>>>>>>>> <version>3.0.0</version></dependency>*
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> In order to make this possible, we will need to
>> >> organize
>> >>>> the
>> >>>>>>> malhar
>> >>>>>>>>>>> project
>> >>>>>>>>>>>> into more granular modules (artifacts). Specifically,
>> >> the
>> >>>>>>>>>> malhar-contrib
>> >>>>>>>>>>>> artifact would essentially just be a pom that specifies
>> >>>> each
>> >>>>>>>> smaller
>> >>>>>>>>>>> module
>> >>>>>>>>>>>> as a dependency:
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> *<!-- in malhar-contrib's pom.xml: -->*
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> *<modules>  <module>kafka</module>*
>> >>>>>>>>>>>> *  <module>twitter</module>*
>> >>>>>>>>>>>> *  <module>redis</module>*
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> *  <!-- other smaller modules --></modules>*
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> *<dependency>  <groupId>com.datatorrent</groupId>
>> >>>>>>>>>>>> <artifactId>malhar-contrib-kafka</artifactId>
>> >>>>>>>>>>>> <version>3.0.0</version></dependency>*
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> *<dependency>  <groupId>com.datatorrent</groupId>
>> >>>>>>>>>>>> <artifactId>malhar-contrib-twitter</artifactId>
>> >>>>>>>>>>>> <version>3.0.0</version></dependency>*
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> *<dependency>  <groupId>com.datatorrent</groupId>
>> >>>>>>>>>>>> <artifactId>malhar-contrib-redis</artifactId>
>> >>>>>>>>>>>> <version>3.0.0</version></dependency>*
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> With these changes, there may be a risk of breaking
>> >>>> backwards
>> >>>>>>>>>>>> compatibility, however I think the gain in usability of
>> >>>>> malhar
>> >>>>>>>> merits
>> >>>>>>>>>> the
>> >>>>>>>>>>>> effort to make this work.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> I am still relatively new to maven, so I would love to
>> >>> get
>> >>>>> some
>> >>>>>>>>>> feedback
>> >>>>>>>>>>>> from other devs about this!
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> --
>> >>>>>>>>>>>> Regards,
>> >>>>>>>>>>>> Andy Perlitch
>> >>>>>>>>>>>> Software Engineer
>> >>>>>>>>>>>> DataTorrent Inc
>> >>>>>>>>>>>> (408)829-9319
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> --
>> >>>>>> Regards,
>> >>>>>> Andy Perlitch
>> >>>>>> Software Engineer
>> >>>>>> DataTorrent Inc
>> >>>>>> (408)829-9319
>> >>
>> >>
>> >>
>> >> --
>> >> Regards,
>> >> Andy Perlitch
>> >> Software Engineer
>> >> DataTorrent Inc
>> >> (408)829-9319
>> >>
>>
>
>

Reply via email to