Priyanka/Chaitanya,

Can you please let me know the reason to have ":" as an unsupported
character?

I am unable to understand when a file having ":" will exist on Hdfs and we
should be ignoring it in the operator.

Thanks,
Chandni

On Fri, May 6, 2016 at 4:44 PM, Chandni Singh <[email protected]>
wrote:

> Just saw that there is *HDFSFileSplitter* in the library as well.
> This sets *ignoreFilePatternRegularExp *to ".*._COPYING_"  and
> *unsupportedChar* to ":",
>
> IMO this class should be removed as well.
>
> Chandni
>
> On Fri, May 6, 2016 at 4:16 PM, Chandni Singh <[email protected]>
> wrote:
>
> > Hi,
> >
> > Recently there was FSFileSplitter added to Malhar library.
> > I have created https://issues.apache.org/jira/browse/APEXMALHAR-2081 to
> > remove this operator and adds its functionality to the FileSplitterInput.
> >
> > The reason to do so is because this extension just adds 3 trivial
> features
> > which makes it difficult for the user to know which operator to use. It
> > adds more classes which essentially do the same thing.
> >
> > This operator add 3 properties to FileSplitterInput.
> >
> > 1. ignoreFilePatternRegularExp: regular expression that specifies which
> > files to ignore.
> > This is useful to have in the FileSplitterInput.
> >
> > 2. unsupportedChar: first of all this is a String. File having this
> String
> > will be ignored.
> > IMO this is redundant. #1 can be used to accomplish this.
> > I think this should be removed.
> >
> > 3. sequentialFileReader: when this property is set, the block metadata of
> > the same files have the same hashcode. This I think may have been done so
> > that all the block metadata of a particular file go to the same block
> > reader.
> >
> > IMO this is a  hacky way of accomplishing this. If an application needs
> > this then this should have been done using a StreamCodec.
> >
> > I think this should be removed.
> >
> > Thanks,
> > Chandni
> >
>

Reply via email to