With Flume you could use batch mode. Flume will wait until the count of
events are delivered (let's say 100), and then bulk write them into HDFS
(as example). On top you could set a timeout, means, if in sec=x you not
hit batch=x write out. That are usefull for very small files (Avro
maybe), and will decrease the NN stress.
cheers,
Alex
Nguyen Manh Tien wrote:
You are correct.
I think the the botleneck maybe in namenode when there are too many
small file, HDFS is for big file, not for so many small file.