For new code we should use org.apache.apex

I prefer not to use "module" in the package name but keep them together
with related operators (modules and operators are not different from users
perspective).

On Wed, Mar 2, 2016 at 9:59 PM, Chinmay Kolhatkar <[email protected]>
wrote:

> +1 for seperate namespace for modules.
>
> On Thu, Mar 3, 2016 at 10:58 AM, Priyanka Gugale <[email protected]
> >
> wrote:
>
> > That is also a option but then I have a question, do we want to treat
> > modules separately or it is just a type of operator, may be a super
> > operator?
> > Also I believe it would be good if we have feature wise packages than
> using
> > our custom terms to create package, so anyone can easily locate the
> > classes.
> >
> >
> > -Priyanka
> >
> > On Thu, Mar 3, 2016 at 12:20 AM, Sandesh Hegde <[email protected]>
> > wrote:
> >
> > > My vote is to have a separate namespace for modules.
> > >
> > > Is it time to introduce
> > > org.apache.apex.module.io.fs ?
> > >
> > > On Wed, Mar 2, 2016 at 3:25 AM Priyanka Gugale <
> [email protected]
> > >
> > > wrote:
> > >
> > > > I am planning to put this module in malhar-library project in
> > > > package: com.datatorrent.lib.io.fs
> > > > Let me know if this is acceptable?
> > > >
> > > > -Priyanka
> > > >
> > > > On Tue, Feb 23, 2016 at 6:45 PM, Priyanka Gugale <
> > > [email protected]
> > > > >
> > > > wrote:
> > > >
> > > > > I haven't created any branch yet, should share it with you as soon
> > as I
> > > > > add the code for module.
> > > > > Surely would be happy to help :)
> > > > >
> > > > > -Priyanka
> > > > >
> > > > > On Tue, Feb 23, 2016 at 6:26 PM, Yogi Devendra <
> > > [email protected]>
> > > > > wrote:
> > > > >
> > > > >> Priyanka,
> > > > >>
> > > > >> Thanks for the update. I will consider these ports during the
> design
> > > > phase
> > > > >> of my proposal for HDFS file copy module.
> > > > >>
> > > > >> I believe you are planning to add this to Apex Malhar. Please post
> > any
> > > > >> link
> > > > >> / private branch (if any) where I can have a look at the first
> cut.
> > > > >>
> > > > >> I will ask for your help if I come across any questions,
> > uncertainties
> > > > >> etc.
> > > > >>
> > > > >> ~ Yogi
> > > > >>
> > > > >> On 23 February 2016 at 17:59, Priyanka Gugale <
> > > [email protected]
> > > > >
> > > > >> wrote:
> > > > >>
> > > > >> > I am planning to have following ports to this module:
> > > > >> >
> > > > >> > Ports
> > > > >> > Input port: None
> > > > >> >
> > > > >> > Output port:
> > > > >> >
> > > > >> >    1. FileMetadata
> > > > >> >    2. BlockMetadata
> > > > >> >    3. Block bytes
> > > > >> >
> > > > >> > -Priyanka
> > > > >> >
> > > > >> > On Tue, Feb 23, 2016 at 2:16 PM, Yogi Devendra <
> > > > [email protected]
> > > > >> >
> > > > >> > wrote:
> > > > >> >
> > > > >> > > Priyanka,
> > > > >> > >
> > > > >> > > Can you please share details about what would be the output
> > ports
> > > > from
> > > > >> > this
> > > > >> > > module?
> > > > >> > >
> > > > >> > > I am thinking of HDFS File Copy Module which can be used in
> > > > >> conjunction
> > > > >> > > with this module to copy files from HDFS to HDFS.
> > > > >> > >
> > > > >> > > ~ Yogi
> > > > >> > >
> > > > >> > > On 18 February 2016 at 10:29, Mohit Jotwani <
> > > [email protected]>
> > > > >> > wrote:
> > > > >> > >
> > > > >> > > > +1 to add this.
> > > > >> > > >
> > > > >> > > > Regards,
> > > > >> > > > Mohit
> > > > >> > > > On 17 Feb 2016 23:30, "Pramod Immaneni" <
> > [email protected]
> > > >
> > > > >> > wrote:
> > > > >> > > >
> > > > >> > > > > +1 to add this module
> > > > >> > > > >
> > > > >> > > > > On Wed, Feb 17, 2016 at 9:21 AM, Priyanka Gugale <
> > > > >> > > > [email protected]
> > > > >> > > > > >
> > > > >> > > > > wrote:
> > > > >> > > > >
> > > > >> > > > > > We need partitions for parallel read but how will the
> > reader
> > > > >> > > partition
> > > > >> > > > > know
> > > > >> > > > > > which offset of the file it should read from. Normally
> > > > >> FileSplitter
> > > > >> > > > > creates
> > > > >> > > > > > this metadata, let's call them as reader task, and
> > forwards
> > > > >> them to
> > > > >> > > > next
> > > > >> > > > > > operator which is block reader. Block reader will
> receive
> > > one
> > > > of
> > > > >> > the
> > > > >> > > > > tasks
> > > > >> > > > > > and read from specified offset in file. If FileSplitter
> is
> > > > >> absent
> > > > >> > one
> > > > >> > > > > > reader partition will have to consume one file entirely,
> > > which
> > > > >> > means
> > > > >> > > we
> > > > >> > > > > > can't have parallel reading over one file. I hope this
> > > answers
> > > > >> your
> > > > >> > > > > > question.
> > > > >> > > > > >
> > > > >> > > > > > Advantage of having this module is having a reusable
> > > component
> > > > >> made
> > > > >> > > up
> > > > >> > > > of
> > > > >> > > > > > operators which are frequently used together to do file
> > > > reading.
> > > > >> > > > > >
> > > > >> > > > > > -Priyanka
> > > > >> > > > > >
> > > > >> > > > > > On Wed, Feb 17, 2016 at 11:31 AM, Yogi Devendra <
> > > > >> > > > [email protected]
> > > > >> > > > > >
> > > > >> > > > > > wrote:
> > > > >> > > > > >
> > > > >> > > > > > > Let me rephrase Ram's question to make it clear:
> > > > >> > > > > > >
> > > > >> > > > > > > For an application developer using Malhar:
> > > > >> > > > > > > What are the advantages / disadvantages of using the
> > > > proposed
> > > > >> > HDFS
> > > > >> > > > File
> > > > >> > > > > > > input Module as compared to directly using
> FileSplitter,
> > > > >> > > BlockReader
> > > > >> > > > > > > Operators available in Malhar?
> > > > >> > > > > > >
> > > > >> > > > > > > ~ Yogi
> > > > >> > > > > > >
> > > > >> > > > > > > On 16 February 2016 at 21:56, Munagala Ramanath <
> > > > >> > > [email protected]
> > > > >> > > > >
> > > > >> > > > > > > wrote:
> > > > >> > > > > > >
> > > > >> > > > > > > > Can parallel read not be achieved by partitioning ?
> > > > >> > > > > > > >
> > > > >> > > > > > > > Ram
> > > > >> > > > > > > >
> > > > >> > > > > > > > On Tue, Feb 16, 2016 at 1:01 AM, Priyanka Gugale <
> > > > >> > > > > > > [email protected]
> > > > >> > > > > > > > >
> > > > >> > > > > > > > wrote:
> > > > >> > > > > > > >
> > > > >> > > > > > > > > Hi,
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > It is a common usecase to read big files on HDFS
> in
> > > > >> parallel
> > > > >> > > > > fashion
> > > > >> > > > > > > i.e.
> > > > >> > > > > > > > > many reader thread are used to read the file in
> > > > parallel.
> > > > >> We
> > > > >> > > can
> > > > >> > > > > > > achieve
> > > > >> > > > > > > > > this on top of Apex using following Malhar
> > operators:
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > 1. AbstractFileSplitter
> > > > >> > > > > > > > > 2. AbstractBlockReader
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > where FileSplitter, as per file metadata, creates
> > > small
> > > > >> > reader
> > > > >> > > > > > tasks(to
> > > > >> > > > > > > > > read file in parts). Those reader tasks are run by
> > > > >> > BlockReaders
> > > > >> > > > in
> > > > >> > > > > > > > parallel
> > > > >> > > > > > > > > to read the file.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > As these operators are generally used together to
> > > > achieve
> > > > >> > file
> > > > >> > > > read
> > > > >> > > > > > > > > operation, I propose we create a module, called
> > > > >> > HDFSFileReader
> > > > >> > > > for
> > > > >> > > > > > > this.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Please provide your suggestions on same.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > -Priyanka
> > > > >> > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to