With Flume you could use batch mode. Flume will wait until the count of events are delivered (let's say 100), and then bulk write them into HDFS (as example). On top you could set a timeout, means, if in sec=x you not hit batch=x write out. That are usefull for very small files (Avro maybe), and will decrease the NN stress.

cheers,
Alex

Nguyen Manh Tien wrote:

You are correct.
I think the the botleneck maybe in namenode when there are too many
small file, HDFS is for big file, not for so many small file.

Reply via email to