hi Jason, I think it maintains starting index of the file with in the block, so there is no need to read whole block every time we want to read small file, also as we know due to performance issue it reserve 64 MB block for any size of file, now again due to same reason it maintain indexes and just read starting point of the file.
-Thanks and Regards, Rahul Patodi Associate Software Engineer, Impetus Infotech (India) Private Limited, www.impetus.com Mob:09907074413 On Wed, Nov 24, 2010 at 11:49 PM, Harsh J <qwertyman...@gmail.com> wrote: > I think HARs maintain indexes of all the file boundaries in the blocks > created, and therefore it would "seek" to the beginning point within > the block to begin reading a particular file. So it does not exactly > "read" the entire block to retrieve that file. > > On Wed, Nov 24, 2010 at 11:22 PM, Jason Ji <jason_j...@yahoo.com> wrote: > > hi guys, > > > > We plan to use hadoop hdfs as the storage to store lots of little > files. > > > > According to the document , it is recommended to use hadoop > > > > Archive to compress those little files to get better performance . > > > > > > > > Our question is that since hdfs is reading the entire say 64m block > every > > time, > > > > Does it mean that everytime when we are just trying to retrieve a single > > file > > > > Inside the archive, hdfs will still read the whole block as well ? > > > > If no, what’s the actual behavior ? anyway we can verify it ? > > > > > > > > Thanks in advance. > > > > Jason > > > > > > > > > > -- > Harsh J > www.harshj.com > --