Hi,

I think we can safely remove that unsupported character. My bad I shouldn't
have kept it in HDFSInput specific code, no harm but it doesn't help in
anyway. We had added it to avoid issue while copying from X source to HDFS.

-Priyanka

On Mon, May 9, 2016 at 12:05 PM, Priyanka Gugale <[email protected]>
wrote:

> Hi,
>
> As I remember there was some issue found during testing. The ":" is not
> supported by HDFS, ideally such files shouldn't exist on HDFS, but I
> remember to have found some bug. Let me look up for reference. If I can't
> find one I will do some more testing around it and we can decide to remove
> it.
>
> -Priyanka
>
> On Mon, May 9, 2016 at 11:37 AM, Pramod Immaneni <[email protected]>
> wrote:
>
>> I see, lets wait for their response on the colon.
>>
>> Thanks
>>
>> On Mon, May 9, 2016 at 11:34 AM, Chandni Singh <[email protected]>
>> wrote:
>>
>> > I am adding the support to ignore files being copied, that is, the files
>> > that end with "_COPYING_" in the FileSplitterInput.
>> >
>> > However I don't understand the ignore character set to ":". Why will
>> there
>> > be files with ":" in the name/path exist on hdfs if these are
>> unsupported
>> > by hdfs.
>> >
>> > Thanks,
>> > Chandni
>> >
>> > On Mon, May 9, 2016 at 11:29 AM, Pramod Immaneni <
>> [email protected]>
>> > wrote:
>> >
>> > > Chandni,
>> > >
>> > > I agree with your original assessment that there shouldn't be a
>> separate
>> > > operator if the new functionality falls under the "functionality
>> domain"
>> > of
>> > > the original operator and the features should just be added to the
>> > original
>> > > operator. Based on your description, I agree with points 1. 2. and 3.
>> > >
>> > > However if you delete an operator that is useful in some use cases,
>> what
>> > is
>> > > the substitute for that knowledge? For example look like the
>> > > HDFSFileSplitter seems to ignore some commonly present temporary
>> files.
>> > Do
>> > > everyone have to learn this themselves and figure it out?
>> > >
>> > > Thanks
>> > >
>> > > On Fri, May 6, 2016 at 4:44 PM, Chandni Singh <
>> [email protected]>
>> > > wrote:
>> > >
>> > > > Just saw that there is *HDFSFileSplitter* in the library as well.
>> > > > This sets *ignoreFilePatternRegularExp *to ".*._COPYING_"  and
>> > > > *unsupportedChar* to ":",
>> > > >
>> > > > IMO this class should be removed as well.
>> > > >
>> > > > Chandni
>> > > >
>> > > > On Fri, May 6, 2016 at 4:16 PM, Chandni Singh <
>> [email protected]
>> > >
>> > > > wrote:
>> > > >
>> > > > > Hi,
>> > > > >
>> > > > > Recently there was FSFileSplitter added to Malhar library.
>> > > > > I have created
>> https://issues.apache.org/jira/browse/APEXMALHAR-2081
>> > > to
>> > > > > remove this operator and adds its functionality to the
>> > > FileSplitterInput.
>> > > > >
>> > > > > The reason to do so is because this extension just adds 3 trivial
>> > > > features
>> > > > > which makes it difficult for the user to know which operator to
>> use.
>> > It
>> > > > > adds more classes which essentially do the same thing.
>> > > > >
>> > > > > This operator add 3 properties to FileSplitterInput.
>> > > > >
>> > > > > 1. ignoreFilePatternRegularExp: regular expression that specifies
>> > which
>> > > > > files to ignore.
>> > > > > This is useful to have in the FileSplitterInput.
>> > > > >
>> > > > > 2. unsupportedChar: first of all this is a String. File having
>> this
>> > > > String
>> > > > > will be ignored.
>> > > > > IMO this is redundant. #1 can be used to accomplish this.
>> > > > > I think this should be removed.
>> > > > >
>> > > > > 3. sequentialFileReader: when this property is set, the block
>> > metadata
>> > > of
>> > > > > the same files have the same hashcode. This I think may have been
>> > done
>> > > so
>> > > > > that all the block metadata of a particular file go to the same
>> block
>> > > > > reader.
>> > > > >
>> > > > > IMO this is a  hacky way of accomplishing this. If an application
>> > needs
>> > > > > this then this should have been done using a StreamCodec.
>> > > > >
>> > > > > I think this should be removed.
>> > > > >
>> > > > > Thanks,
>> > > > > Chandni
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Reply via email to