Sergey Zhemzhitsky created CAMEL-8542:
-----------------------------------------

             Summary: hdfs & hdfs2 components are merging data locally instead 
of streaming it
                 Key: CAMEL-8542
                 URL: https://issues.apache.org/jira/browse/CAMEL-8542
             Project: Camel
          Issue Type: Improvement
          Components: camel-hdfs
    Affects Versions: 2.15.0
            Reporter: Sergey Zhemzhitsky


Here is the 
[conversation|http://camel.465427.n5.nabble.com/HDFS2-Component-and-NORMAL-FILE-type-td5764655.html]

CAMEL-4555 introduced an ability to merge files from within a single directory.
The merge operation is done locally, i.e. by means of creating the whole file 
on the local file system (that may be space and time consuming in case of multi 
-gigabyte, -terabyte files). 

# It will be more efficient to stream these files directly from hdfs, for 
example by wrapping them into 
[SequenceInputStream|http://docs.oracle.com/javase/7/docs/api/java/io/SequenceInputStream.html]
 or something like this 
[MapReducePartInputStreamEnumeration|https://github.com/yahoo/Glimmer/blob/master/src/main/java/com/yahoo/glimmer/util/MapReducePartInputStreamEnumeration.java]
# It will be really great if there will be an ability to switch merging on and 
off by means of an option or parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to