Hello All, Could any one give me some information how flume handles small files? If flume agents are setup for text log files, how will flume ensure that there are not many small files?. I believe waiting for fixed time before pumping to HDFS may not guarantee the block sized files.
I am trying to write a client app to collect data to hdfs directly using Java APIs. I am sure i will come across this issue. Are there any utilities or tricks to combine files from hdfs to larger files (without an MR job). Any help will be greatly appreciated -R
