I see, lets wait for their response on the colon. Thanks
On Mon, May 9, 2016 at 11:34 AM, Chandni Singh <[email protected]> wrote: > I am adding the support to ignore files being copied, that is, the files > that end with "_COPYING_" in the FileSplitterInput. > > However I don't understand the ignore character set to ":". Why will there > be files with ":" in the name/path exist on hdfs if these are unsupported > by hdfs. > > Thanks, > Chandni > > On Mon, May 9, 2016 at 11:29 AM, Pramod Immaneni <[email protected]> > wrote: > > > Chandni, > > > > I agree with your original assessment that there shouldn't be a separate > > operator if the new functionality falls under the "functionality domain" > of > > the original operator and the features should just be added to the > original > > operator. Based on your description, I agree with points 1. 2. and 3. > > > > However if you delete an operator that is useful in some use cases, what > is > > the substitute for that knowledge? For example look like the > > HDFSFileSplitter seems to ignore some commonly present temporary files. > Do > > everyone have to learn this themselves and figure it out? > > > > Thanks > > > > On Fri, May 6, 2016 at 4:44 PM, Chandni Singh <[email protected]> > > wrote: > > > > > Just saw that there is *HDFSFileSplitter* in the library as well. > > > This sets *ignoreFilePatternRegularExp *to ".*._COPYING_" and > > > *unsupportedChar* to ":", > > > > > > IMO this class should be removed as well. > > > > > > Chandni > > > > > > On Fri, May 6, 2016 at 4:16 PM, Chandni Singh <[email protected] > > > > > wrote: > > > > > > > Hi, > > > > > > > > Recently there was FSFileSplitter added to Malhar library. > > > > I have created https://issues.apache.org/jira/browse/APEXMALHAR-2081 > > to > > > > remove this operator and adds its functionality to the > > FileSplitterInput. > > > > > > > > The reason to do so is because this extension just adds 3 trivial > > > features > > > > which makes it difficult for the user to know which operator to use. > It > > > > adds more classes which essentially do the same thing. > > > > > > > > This operator add 3 properties to FileSplitterInput. > > > > > > > > 1. ignoreFilePatternRegularExp: regular expression that specifies > which > > > > files to ignore. > > > > This is useful to have in the FileSplitterInput. > > > > > > > > 2. unsupportedChar: first of all this is a String. File having this > > > String > > > > will be ignored. > > > > IMO this is redundant. #1 can be used to accomplish this. > > > > I think this should be removed. > > > > > > > > 3. sequentialFileReader: when this property is set, the block > metadata > > of > > > > the same files have the same hashcode. This I think may have been > done > > so > > > > that all the block metadata of a particular file go to the same block > > > > reader. > > > > > > > > IMO this is a hacky way of accomplishing this. If an application > needs > > > > this then this should have been done using a StreamCodec. > > > > > > > > I think this should be removed. > > > > > > > > Thanks, > > > > Chandni > > > > > > > > > >
