You're correct, currently HDFS only supports reading from closed files. You can configure flume to write your data in small enough chunks so you can do incremental processing.
-Joey On Nov 22, 2011, at 2:01, Romeo Kienzler <[email protected]> wrote: > Hi, > > I'm planning to use Fume in order to stream data from a local client machine > into HDFS running on a cloud environment. > > Is there a way to start a mapper already on an incomplete file? As I know a > file in HDFS has to be closed first before a mapper can start. > > Is this true? > > Any possible idea for a solution of this problem? > > Or do I have to write smaller chunks of my big input file and create multiple > files in HDFS and start a separate map task on each file once it has been > closed? > > Best Regards, > > Romeo > > Romeo Kienzler > r o m e o @ o r m i u m . d e
