I see, lets wait for their response on the colon.

Thanks

On Mon, May 9, 2016 at 11:34 AM, Chandni Singh <[email protected]>
wrote:

> I am adding the support to ignore files being copied, that is, the files
> that end with "_COPYING_" in the FileSplitterInput.
>
> However I don't understand the ignore character set to ":". Why will there
> be files with ":" in the name/path exist on hdfs if these are unsupported
> by hdfs.
>
> Thanks,
> Chandni
>
> On Mon, May 9, 2016 at 11:29 AM, Pramod Immaneni <[email protected]>
> wrote:
>
> > Chandni,
> >
> > I agree with your original assessment that there shouldn't be a separate
> > operator if the new functionality falls under the "functionality domain"
> of
> > the original operator and the features should just be added to the
> original
> > operator. Based on your description, I agree with points 1. 2. and 3.
> >
> > However if you delete an operator that is useful in some use cases, what
> is
> > the substitute for that knowledge? For example look like the
> > HDFSFileSplitter seems to ignore some commonly present temporary files.
> Do
> > everyone have to learn this themselves and figure it out?
> >
> > Thanks
> >
> > On Fri, May 6, 2016 at 4:44 PM, Chandni Singh <[email protected]>
> > wrote:
> >
> > > Just saw that there is *HDFSFileSplitter* in the library as well.
> > > This sets *ignoreFilePatternRegularExp *to ".*._COPYING_"  and
> > > *unsupportedChar* to ":",
> > >
> > > IMO this class should be removed as well.
> > >
> > > Chandni
> > >
> > > On Fri, May 6, 2016 at 4:16 PM, Chandni Singh <[email protected]
> >
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Recently there was FSFileSplitter added to Malhar library.
> > > > I have created https://issues.apache.org/jira/browse/APEXMALHAR-2081
> > to
> > > > remove this operator and adds its functionality to the
> > FileSplitterInput.
> > > >
> > > > The reason to do so is because this extension just adds 3 trivial
> > > features
> > > > which makes it difficult for the user to know which operator to use.
> It
> > > > adds more classes which essentially do the same thing.
> > > >
> > > > This operator add 3 properties to FileSplitterInput.
> > > >
> > > > 1. ignoreFilePatternRegularExp: regular expression that specifies
> which
> > > > files to ignore.
> > > > This is useful to have in the FileSplitterInput.
> > > >
> > > > 2. unsupportedChar: first of all this is a String. File having this
> > > String
> > > > will be ignored.
> > > > IMO this is redundant. #1 can be used to accomplish this.
> > > > I think this should be removed.
> > > >
> > > > 3. sequentialFileReader: when this property is set, the block
> metadata
> > of
> > > > the same files have the same hashcode. This I think may have been
> done
> > so
> > > > that all the block metadata of a particular file go to the same block
> > > > reader.
> > > >
> > > > IMO this is a  hacky way of accomplishing this. If an application
> needs
> > > > this then this should have been done using a StreamCodec.
> > > >
> > > > I think this should be removed.
> > > >
> > > > Thanks,
> > > > Chandni
> > > >
> > >
> >
>

Reply via email to