So I am aware of the problem with small files and I have read this article http://www.cloudera.com/blog/2009/02/the-small-files-problem/
I am just wondering if there has been any real change in this? For example's sake, suppose you just want an HDFS Cluster that never does any m/r jobs but would store an MP3 of every song known to exist, in /ARTIST/ALBUM/song kind of structure. And if some one wanted they could just go HDFS://U2/Joshua Tree/withOrWithoutyou.mp3 Yes I know there are practical issues with this example such as search and browsing, but let's ignore those. I don't really want to have to write a file system to go on top of a file system for this kind of example, so I'd imagine I would use the har, but wanted to know if there is any other thoughts out there. Also, I was wondering if there were any tips and tricks for using har...auto archiving, things like that? Ananth T Sarathy
