On Thu, Aug 6, 2009 at 1:20 PM, Harold Valdivia Garcia < [email protected]> wrote:
> Hi... I was reading the HDFS code, and I can't find a way to read the > replicated blocks of a block-file. > > DFS.getFileBlockLocations returns all blocks of a file > File = block-a, block-b, ..... block-n. > > each of these blocks has its replicated blocks. if for instance the > replication factor is 3, how can I retrieve block-a1, block-a2, block-a3 in > parallel from my user code? > > I did read DFSClient, DFSClient.DFInputStream to understand how hadoop > retrieves data from blocks, but it is hard. > There is no an easy way to do this? Correct - this is not a supported operation. People have discussed doing it, but no one has put in the work to get it done. I think I may have accidentally volunteered to do it at one point, but it hasn't been a priority quite yet - it's an odd mapreduce job that can process data faster than the datanode can serve it. -Todd > > -- > ****************************************** > Harold Dwight Valdivia Garcia > Graduate Student > M.S Computer Engineering > University of Puerto Rico, Mayaguez Campus > ****************************************** >
