Hi,
Yes, I suppose that is true too. But having individual modules for each
result in too many modules in Malhar. Is that OK?
I am open to any way of organizing which stays consistent with rest of the
Malhar repository. From what I see there are 3 options here:
1. Create fine-grained modules
- In this particular example, we can have a transformations
folder in Malhar and inside it three separate maven modules for inidividual
parsers. The abstract classes and interfaces can still go in malhar-library
as they do not have any dependencies
- Pros: Only relevant dependencies will be part of the project
- Cons: Total number of modules will increase
2. Create modules by clubbing based on functionality
- Here it means all parsers go together in one module
- Pros: Easier to find operators as located in one package
- Cons: Unnecessary dependencies may be pulled in project.
3. Keep all operators not having any external library dependencies in
malhar-library and all others can be created as modules
- This would mean XML parser, which has only JDK dependencies will
be in malhar-lib along with abstract classes. Json (depends on Jackson,
jakson is part of malhar-library at this point but I don't know if there is
a plan to remove it) and csv (depends on supercsv) can be created as
individual modules.
- Pros: Minimizes the number of modules we will create in Malhar.
Also makes it possible to load only required dependencies in project.
- Cons: Makes it hard to find which parsers are available vs which
are not. And I think this organization is little awkward and non-intuitive.
I am not sure which is the best way to organize. I just think it should
be consistent across. Thoughts?
Thanks,
Isha
On Thu, Dec 24, 2015 at 12:25 PM, Thomas Weise <[email protected]>
wrote:
> Most users are not interested using all parsers and their dependencies in
> their project. They have a specific format, for which they will need a
> parser. When I have xml files, I don't need the dependencies for csv.
>
> Thomas
>
> On Thu, Dec 24, 2015 at 12:22 PM, Isha Arkatkar <[email protected]>
> wrote:
>
> > Hi,
> >
> > Agree with Shubham's point here as well. The parser split in contrib
> and
> > library is probably harder to find and manage going forward. However, I
> > think increasing dependencies in library also has its downside.
> >
> > As mentioned in separate mail thread on splitting malhar into modules,
> can
> > we create a single module for parsers outside library and contrib? I can
> > close the current pull request and open a new one if that makes more
> sense.
> >
> > Thanks,
> > Isha
> >
> >
> > On Wed, Dec 23, 2015 at 11:00 PM, Shubham Pathak <
> [email protected]>
> > wrote:
> >
> > > Hi,
> > >
> > > Can we take a decision regarding whether all parsers need to be in one
> > > place ? ( either malhar lib / contrib )
> > >
> > > In this PR https://github.com/apache/incubator-apex-malhar/pull/137 we
> > are
> > > moving XML and JSON parser from contrib to lib.
> > > I suggest we move CSV parsers as well .
> > >
> > > I understand Thomas's point of dependencies, but shouldn't we also
> think
> > > about users that use malhar that would wish to have all parsers under
> > > common bucket ?
> > >
> > >
> > > Thanks,
> > > Shubham
> > >
> > >
> > > On Fri, Dec 18, 2015 at 2:44 AM, Thomas Weise <[email protected]>
> > > wrote:
> > >
> > > > This has changed over time. Pieces that cannot be covered by unit
> tests
> > > > should still be kept out of library.
> > > >
> > > > In addition, we cannot keep on expanding the number of dependencies
> in
> > a
> > > > single monolithic module. Hence, we are going start to create smaller
> > > > modules with their dependencies and tests setup correctly. The first
> of
> > > > those will be Kafka, for which a PR is currently open.
> > > >
> > > > For lib, we can add more operators as long as they don't introduce
> new
> > > > dependencies. When building an app and using one operator from lib,
> > > > everything else comes with it. That's a problem for the application
> > > > assembly we are trying to address.
> > > >
> > > > Thomas
> > > >
> > > >
> > > > On Thu, Dec 17, 2015 at 1:06 PM, Sandesh Hegde <
> > [email protected]>
> > > > wrote:
> > > >
> > > > > Even my understanding was along the lines, "if an unit test can't
> be
> > > run
> > > > on
> > > > > a developer machine without installing dependencies (
> RabbitMq/redis
> > > ec),
> > > > > it should go into contrib".
> > > > >
> > > > > Documenting the exact process as to what goes where helps.
> > > > >
> > > > > On Thu, Dec 17, 2015 at 12:16 PM Chinmay Kolhatkar <
> > > > > [email protected]>
> > > > > wrote:
> > > > >
> > > > > > Hi Isha,
> > > > > >
> > > > > > I think what Shubham meant to say is any operator which has a
> > > > dependency
> > > > > on
> > > > > > external entity (not library) should go in contrib. Others should
> > go
> > > in
> > > > > > library.
> > > > > >
> > > > > > For eg. DB related operators needs an instance/server of respect
> > > > database
> > > > > > to be running. Without that running server, the operator cannot
> > > satisfy
> > > > > the
> > > > > > functionality expected. Hence it goes into contrib.
> > > > > >
> > > > > > I think none of the parsers will have dependency on external
> > entity,
> > > > > hence
> > > > > > they should go in library and not contrib.
> > > > > >
> > > > > > - Chinmay.
> > > > > > On 18 Dec 2015 00:42, "Isha Arkatkar" <[email protected]>
> > wrote:
> > > > > >
> > > > > > > Hi Shubham,
> > > > > > >
> > > > > > > I think it is somewhat subjective what goes in Contrib Vs
> > > > > Library.
> > > > > > > Here is what I understood about general guideline:
> > > > > > > An operator would go in malhar-library if it does not
> > have
> > > > any
> > > > > > > other library dependency than what is already available. If
> > > operator
> > > > > > needs
> > > > > > > to include a dependency, it could be added to Contrib but not
> > > > library.
> > > > > > >
> > > > > > > If we add an external library dependency in
> Malhar-library,
> > > the
> > > > > size
> > > > > > > of the lib jar keeps on growing. So, if we refer malhar-lib in
> a
> > > > > project,
> > > > > > > the size of the total package would increase, even if we may
> not
> > > > > directly
> > > > > > > use all external libs.
> > > > > > >
> > > > > > > For this reason, in pull request #137
> > > > > > > <https://github.com/apache/incubator-apex-malhar/pull/137> , I
> > > move
> > > > > only
> > > > > > > Json and XML parsers to malhar-library. Csv one had library
> > > reference
> > > > > to
> > > > > > > supercsv, so it is still in contrib.
> > > > > > >
> > > > > > > Please correct if I missed something. :)
> > > > > > >
> > > > > > > Thanks!
> > > > > > > Isha
> > > > > > >
> > > > > > > On Wed, Dec 16, 2015 at 11:27 PM, Shubham Pathak <
> > > > > > [email protected]>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > What do we mean by "additional dependency" in case of
> deciding
> > > > > contrib
> > > > > > vs
> > > > > > > > lib ?
> > > > > > > > As i understand, "additional dependency" is when an operator
> > is
> > > > > > > > interacting with external technologies . For e.g kafka, MQ ,
> > > HBase
> > > > > etc.
> > > > > > > > By this definition even CSV parsers need to be moved to
> > > malhar-lib.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Shubham
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, Dec 17, 2015 at 5:39 AM, Isha Arkatkar <
> > > > [email protected]
> > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Chandni,
> > > > > > > > >
> > > > > > > > > I have moved that as well to lib, as the parsers
> depended
> > on
> > > > > that.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Isha
> > > > > > > > >
> > > > > > > > > On Wed, Dec 16, 2015 at 1:01 PM, Chandni Singh <
> > > > > > > [email protected]>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > There is a converter package under
> com.datatorrent.contrib
> > > > which
> > > > > > has
> > > > > > > a
> > > > > > > > > > Converter API. This belongs in library as well.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Chandni
> > > > > > > > > >
> > > > > > > > > > On Tue, Dec 15, 2015 at 1:29 PM, Chandni Singh <
> > > > > > > > [email protected]>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Isha,
> > > > > > > > > > >
> > > > > > > > > > > Thanks for moving this. When you move these files,
> please
> > > > place
> > > > > > > then
> > > > > > > > > > under
> > > > > > > > > > > a package which reflects its functionality. I don't see
> > the
> > > > > need
> > > > > > > for
> > > > > > > > > > > package called schema.
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Chandni
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Dec 15, 2015 at 12:31 PM, Isha Arkatkar <
> > > > > > > > [email protected]>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > >> Hi,
> > > > > > > > > > >>
> > > > > > > > > > >> For csv parser there is an additional dependency.
> So,
> > > I'll
> > > > > > move
> > > > > > > > only
> > > > > > > > > > >> json
> > > > > > > > > > >> and xml to new location.
> > > > > > > > > > >>
> > > > > > > > > > >> Thanks,
> > > > > > > > > > >> Isha
> > > > > > > > > > >>
> > > > > > > > > > >> On Tue, Dec 15, 2015 at 11:42 AM, Thomas Weise <
> > > > > > > > > [email protected]>
> > > > > > > > > > >> wrote:
> > > > > > > > > > >>
> > > > > > > > > > >> > As long as the operators don't introduce additional
> > > > > > dependencies
> > > > > > > > > they
> > > > > > > > > > >> > should be in lib.
> > > > > > > > > > >> >
> > > > > > > > > > >> >
> > > > > > > > > > >> > On Tue, Dec 15, 2015 at 9:34 AM, Shubham Pathak <
> > > > > > > > > > >> [email protected]>
> > > > > > > > > > >> > wrote:
> > > > > > > > > > >> >
> > > > > > > > > > >> > > Hi Chandni,
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > I had written those operators.
> > > > > > > > > > >> > > Here is the jira for that
> > > > > > > > > > >> https://malhar.atlassian.net/browse/MLHR-1838
> > > > > > > > > > >> > > You would find the entire discussion there.
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > Why are all these operator under Malhar/contrib
> and
> > > not
> > > > > > > > Malhar/lib
> > > > > > > > > > >> > > When i was writing the code i saw
> AbstractCsvParser
> > in
> > > > > > > contriib
> > > > > > > > > and
> > > > > > > > > > >> hence
> > > > > > > > > > >> > > added there.
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > Recently i got to know which operators must go in
> > > > contrib
> > > > > > and
> > > > > > > > what
> > > > > > > > > > >> must
> > > > > > > > > > >> > go
> > > > > > > > > > >> > > in lib.
> > > > > > > > > > >> > > By that definition, these operators must belong to
> > > lib.
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > Thanks,
> > > > > > > > > > >> > > Shubham
> > > > > > > > > > >> > >
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > On Tue, Dec 15, 2015 at 1:32 PM, Chandni Singh <
> > > > > > > > > > >> [email protected]>
> > > > > > > > > > >> > > wrote:
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > > Hi,
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > I just came across couple of formatter and
> parser
> > > > > > operators
> > > > > > > > > which
> > > > > > > > > > >> are
> > > > > > > > > > >> > > under
> > > > > > > > > > >> > > > Malhar/contrib/schema.
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > I have couple of questions:
> > > > > > > > > > >> > > > 1. What does schema denote here?
> > > > > > > > > > >> > > > 2. Why formatter/parser which are functions are
> > > placed
> > > > > > under
> > > > > > > > > > schema
> > > > > > > > > > >> > > > package?
> > > > > > > > > > >> > > > 2. Why are all these operator under
> Malhar/contrib
> > > and
> > > > > not
> > > > > > > > > > >> Malhar/lib
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > Thanks,
> > > > > > > > > > >> > > > Chandni
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > >
> > > > > > > > > > >> >
> > > > > > > > > > >>
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>