hi guys,
We plan to use hadoop hdfs as the storage to store lots of little files. According to the document , it is recommended to use hadoop Archive to compress those little files to get better performance . Our question is that since hdfs is reading the entire say 64m block every time, Does it mean that everytime when we are just trying to retrieve a single file Inside the archive, hdfs will still read the whole block as well ? If no, what’s the actual behavior ? anyway we can verify it ? Thanks in advance. Jason