I think HARs maintain indexes of all the file boundaries in the blocks created, and therefore it would "seek" to the beginning point within the block to begin reading a particular file. So it does not exactly "read" the entire block to retrieve that file.
On Wed, Nov 24, 2010 at 11:22 PM, Jason Ji <jason_j...@yahoo.com> wrote: > hi guys, > > We plan to use hadoop hdfs as the storage to store lots of little files. > > According to the document , it is recommended to use hadoop > > Archive to compress those little files to get better performance . > > > > Our question is that since hdfs is reading the entire say 64m block every > time, > > Does it mean that everytime when we are just trying to retrieve a single > file > > Inside the archive, hdfs will still read the whole block as well ? > > If no, what’s the actual behavior ? anyway we can verify it ? > > > > Thanks in advance. > > Jason > > > -- Harsh J www.harshj.com