If you're looking at automated file/record/event collection, take a look at Apache Flume: http://incubator.apache.org/flume/. It does well for distributed collections as well and is very configurable.
Otherwise, write a scheduled script to do the uploads every X period (your choice). Consider using https://github.com/edwardcapriolo/filecrush or similar tools too, if your files are much small and getting in the way of MR processing. On Fri, Jul 13, 2012 at 8:59 AM, Manoj Babu <manoj...@gmail.com> wrote: > Hi, > > I need to upload large xml files files daily. Right now am having a small > program to read all the files from local folder and writing it to HDFS as a > single file. Is this a right way? > If there any best practices or optimized way to achieve this Kindly let me > know. > > Thanks in advance! > > Cheers! > Manoj. > -- Harsh J