As part of the change I recommend to separate benchmark application from operators certification and move benchmark to the APEX core.

Thank you,

Vlad

On 9/29/15 00:19, Andy Perlitch wrote:
Hi all,

This is a first cut at a plan to restructure malhar in a way that is more
portable and adherent to Maven's principles of modularity and dependency
management.

Overview of Current Malhar Architecture
---------------------------------------------------------------
The current malhar repo consists of several maven modules:

* *malhar-library*
    operators which do not require additional transitive dependencies beyond
what Apex and Hadoop require
*  *malhar-contrib*
    operators requiring other maven dependencies
* *malhar-demos*
    demo applications
* *malhar-samples*
    sample code showing example usage of malhar operators
* *malhar-apps*
    apex applications (currently only logstream)


Proposed Changes
---------------------------------------------------------------

1. *Scrub malhar-library for any operators needing additional dependencies*
   `malhar-library` is intended to consist of only operators without extra
transitive dependencies. All operators should be checked for the necessity
of extra dependencies.

2. *Move operators from malhar-demos and malhar-apps into contrib (or
library if prudent)*
     There are various operators in both of these modules that are general
enough to move into library or contrib.

3. *Create modules for all contrib subfolders*
     All folders under `contrib/src/main/com/datatorrent/contrib/` should be
converted to modules of contrib and listed as such in `/contrib/pom.xml`.
     Additionally, each of these smaller contrib modules will have its own
version and dependencies.

4. *Use the Shades Plugin to allow for backwards-compatible fully-qualified
class names*
     This is made possible by shades class relocation
<https://maven.apache.org/plugins/maven-shade-plugin/examples/class-relocation.html>
feature. This might be a bit error prone as well as confusing to use for
outside developers, but it must be done if these changes are to be made
prior to a major release.



Let me know what you all think of this approach.

Best,
Andy


On Tue, Sep 22, 2015 at 11:20 AM, Chetan Narsude <[email protected]>
wrote:

+1

On Tue, Sep 22, 2015 at 11:08 AM, Gaurav Gupta <[email protected]>
wrote:

I agree with David.. Each artifact should have it's own version

Thanks
-Gaurav

On Tue, Sep 22, 2015 at 11:07 AM, David Yan <[email protected]>
wrote:
I actually think that each baby artifact should have its own version,
because each artifact has its own interface and its own life cycle,
especially after we break up the giant library, applications will
depend
on
the baby artifacts instead of the giant library.  For example if there
is
no change in malhar-contrib-kafka (I think the name should actually be
apex-malhar-kafka), we should not confuse users by bumping the version.

David

On Tue, Sep 22, 2015 at 9:03 AM, Andy Perlitch <[email protected]>
wrote:

Tushar,

I agree that all modules should inherit the version from the "parent
pom"
of the malhar repo. I think the benefits outweigh the cost of bumping
versions of components that haven't actually changed. I'd love to get
others feedback on this as well.

On another note, I plan on starting a spreadsheet/googledoc with the
possible groupings of operators into these modules. Stay tuned...

-Andy

On Mon, Sep 21, 2015 at 11:51 PM, Tushar Gosavi <
[email protected]>
wrote:

+1 for the general idea

Does these independent modules going to have independent versions?
For
example, if there is no change in kafka operator between malhar 3.0
and
malhar 4.0, will we increment version of malhar-contrib-kafka to
4.0. I
have learned from my previous project that, It is easier to manage
versions
if we make all modules at same version level for a release, even if
there
is no change in a particular module.

- Tushar.



On Fri, Sep 18, 2015 at 12:18 AM, Timothy Farkas <
[email protected]>
wrote:

I agree Andy's solution is better, but just for the sake of
argument
profiles can be inherited from a parent pom, so if the maven
archetype
defines a new project with a parent pom with the correct profiles
defined,
then the desired profiles can be activated in the pom of the new
project.
It is no more complicated than adding additional dependencies to
your
project.

On Thu, Sep 17, 2015 at 10:32 AM, Sandesh Hegde <
[email protected]
wrote:

Currently all the dependencies in Malhar-Contrib are marked as
optional.
So
users have to already modify the existing POM to use it in
their
project.
So restructuring should be fine.

On Thu, Sep 17, 2015 at 11:29 AM Chetan Narsude <
[email protected]>
wrote:

The profiles are excellent when you are developing
malhar-contrib.
Profiles
do not work when you are using malhar-contrib. The problem
Andy
is
trying
to solve is the later. If there is an elegant solution which
I
am
missing
using profiles, please correct me.

The way Andy suggested is the way many successful projects do
it.
Look
at
Netty as an example.

+1 for that.


--
Chetan



On Thu, Sep 17, 2015 at 11:22 AM, Timothy Farkas <
[email protected]>
wrote:

I think restructuring the project in that way would be the
technically
correct thing to do, but if people are unwilling to accept
the
change
in
project structure you could achieve something similar by
using
maven
profiles. With profiles the project structure would remain
as
is.
Profiles
could be added to the malhar pom, and a profile would
define
the
dependencies needed for different types of operators. For
example
the
hbase
profile would define the dependencies for the hbase
operator.
Then
any
project using a malhar library would just activate the
correct
profile
in
it's pom, and the correct dependencies would be pulled in.


http://maven.apache.org/guides/introduction/introduction-to-profiles.html
On Thu, Sep 17, 2015 at 10:01 AM, Andy Perlitch <
[email protected]>
wrote:

Hi everyone,

I am currently assigned to MLHR-1843
<https://malhar.atlassian.net/browse/MLHR-1843>, which
essentially
aims
to
expose smaller, more consumable maven artifacts that
would
do
away
with
the
need to manually include necessary dependencies based on
the
operators
in
use.

As an example, say I am building an app package that
needs
Kafka
input
and
output operators, but I don't want all the other
transitive
dependencies
that come via malhar-contrib. Currently I would need to
specify
malhar-contrib as a dependency, and add an exclusions
block
in
my
app
package pom:





*<dependency>  <groupId>com.datatorrent</groupId>
<artifactId>malhar-contrib</artifactId>
<version>3.0.0</version>
<!--
so
none of malhar-contrib's deps are included -->*






*  <exclusions>    <exclusion>      <groupId>*</groupId>
<artifactId>*</artifactId>    </exclusion>
</exclusions></dependency>*
Then, I would have to include the kafka library
explicitly
as a
dependency:




*<dependency>  <groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.10</artifactId>
<version>0.8.1.1</version></dependency>*

Wouldn't it be nice if I could just put this in my pom?:





*<dependency>  <groupId>com.datatorrent</groupId>
<artifactId>malhar-contrib-kafka</artifactId>
<version>3.0.0</version></dependency>*


In order to make this possible, we will need to organize
the
malhar
project
into more granular modules (artifacts). Specifically, the
malhar-contrib
artifact would essentially just be a pom that specifies
each
smaller
module
as a dependency:

*<!-- in malhar-contrib's pom.xml: -->*

*<modules>  <module>kafka</module>*
*  <module>twitter</module>*
*  <module>redis</module>*

*  <!-- other smaller modules --></modules>*




*<dependency>  <groupId>com.datatorrent</groupId>
<artifactId>malhar-contrib-kafka</artifactId>
<version>3.0.0</version></dependency>*




*<dependency>  <groupId>com.datatorrent</groupId>
<artifactId>malhar-contrib-twitter</artifactId>
<version>3.0.0</version></dependency>*




*<dependency>  <groupId>com.datatorrent</groupId>
<artifactId>malhar-contrib-redis</artifactId>
<version>3.0.0</version></dependency>*

With these changes, there may be a risk of breaking
backwards
compatibility, however I think the gain in usability of
malhar
merits
the
effort to make this work.

I am still relatively new to maven, so I would love to
get
some
feedback
from other devs about this!

--
Regards,
Andy Perlitch
Software Engineer
DataTorrent Inc
(408)829-9319



--
Regards,
Andy Perlitch
Software Engineer
DataTorrent Inc
(408)829-9319




Reply via email to