Hi, I think we can safely remove that unsupported character. My bad I shouldn't have kept it in HDFSInput specific code, no harm but it doesn't help in anyway. We had added it to avoid issue while copying from X source to HDFS.
-Priyanka On Mon, May 9, 2016 at 12:05 PM, Priyanka Gugale <[email protected]> wrote: > Hi, > > As I remember there was some issue found during testing. The ":" is not > supported by HDFS, ideally such files shouldn't exist on HDFS, but I > remember to have found some bug. Let me look up for reference. If I can't > find one I will do some more testing around it and we can decide to remove > it. > > -Priyanka > > On Mon, May 9, 2016 at 11:37 AM, Pramod Immaneni <[email protected]> > wrote: > >> I see, lets wait for their response on the colon. >> >> Thanks >> >> On Mon, May 9, 2016 at 11:34 AM, Chandni Singh <[email protected]> >> wrote: >> >> > I am adding the support to ignore files being copied, that is, the files >> > that end with "_COPYING_" in the FileSplitterInput. >> > >> > However I don't understand the ignore character set to ":". Why will >> there >> > be files with ":" in the name/path exist on hdfs if these are >> unsupported >> > by hdfs. >> > >> > Thanks, >> > Chandni >> > >> > On Mon, May 9, 2016 at 11:29 AM, Pramod Immaneni < >> [email protected]> >> > wrote: >> > >> > > Chandni, >> > > >> > > I agree with your original assessment that there shouldn't be a >> separate >> > > operator if the new functionality falls under the "functionality >> domain" >> > of >> > > the original operator and the features should just be added to the >> > original >> > > operator. Based on your description, I agree with points 1. 2. and 3. >> > > >> > > However if you delete an operator that is useful in some use cases, >> what >> > is >> > > the substitute for that knowledge? For example look like the >> > > HDFSFileSplitter seems to ignore some commonly present temporary >> files. >> > Do >> > > everyone have to learn this themselves and figure it out? >> > > >> > > Thanks >> > > >> > > On Fri, May 6, 2016 at 4:44 PM, Chandni Singh < >> [email protected]> >> > > wrote: >> > > >> > > > Just saw that there is *HDFSFileSplitter* in the library as well. >> > > > This sets *ignoreFilePatternRegularExp *to ".*._COPYING_" and >> > > > *unsupportedChar* to ":", >> > > > >> > > > IMO this class should be removed as well. >> > > > >> > > > Chandni >> > > > >> > > > On Fri, May 6, 2016 at 4:16 PM, Chandni Singh < >> [email protected] >> > > >> > > > wrote: >> > > > >> > > > > Hi, >> > > > > >> > > > > Recently there was FSFileSplitter added to Malhar library. >> > > > > I have created >> https://issues.apache.org/jira/browse/APEXMALHAR-2081 >> > > to >> > > > > remove this operator and adds its functionality to the >> > > FileSplitterInput. >> > > > > >> > > > > The reason to do so is because this extension just adds 3 trivial >> > > > features >> > > > > which makes it difficult for the user to know which operator to >> use. >> > It >> > > > > adds more classes which essentially do the same thing. >> > > > > >> > > > > This operator add 3 properties to FileSplitterInput. >> > > > > >> > > > > 1. ignoreFilePatternRegularExp: regular expression that specifies >> > which >> > > > > files to ignore. >> > > > > This is useful to have in the FileSplitterInput. >> > > > > >> > > > > 2. unsupportedChar: first of all this is a String. File having >> this >> > > > String >> > > > > will be ignored. >> > > > > IMO this is redundant. #1 can be used to accomplish this. >> > > > > I think this should be removed. >> > > > > >> > > > > 3. sequentialFileReader: when this property is set, the block >> > metadata >> > > of >> > > > > the same files have the same hashcode. This I think may have been >> > done >> > > so >> > > > > that all the block metadata of a particular file go to the same >> block >> > > > > reader. >> > > > > >> > > > > IMO this is a hacky way of accomplishing this. If an application >> > needs >> > > > > this then this should have been done using a StreamCodec. >> > > > > >> > > > > I think this should be removed. >> > > > > >> > > > > Thanks, >> > > > > Chandni >> > > > > >> > > > >> > > >> > >> > >
