I am reading the "Hadoop The Definitive Guide", and in the page 71, it said, when there are too many small files, the memory of the NameNode will be eat out since each file need to keep its metadata in NameNode. The book also suggest using Hadoop Archives, or HAR files to pack files into HDFS blocks.
Hope this can help you! Best Regards Jiamin Lu
