On Thu, Aug 6, 2009 at 1:20 PM, Harold Valdivia Garcia <
[email protected]> wrote:

> Hi... I was reading the HDFS code, and I can't find a way to read the
> replicated blocks of a block-file.
>
> DFS.getFileBlockLocations returns all blocks of a file
> File = block-a, block-b, ..... block-n.
>
> each of these blocks has its replicated blocks. if for instance the
> replication factor is 3, how can I retrieve block-a1, block-a2, block-a3 in
> parallel from my user code?
>
> I did read DFSClient, DFSClient.DFInputStream to understand how hadoop
> retrieves data from blocks, but it is hard.
> There is no an easy way to do this?


Correct - this is not a supported operation. People have discussed doing it,
but no one has put in the work to get it done. I think I may have
accidentally volunteered to do it at one point, but it hasn't been a
priority quite yet - it's an odd mapreduce job that can process data faster
than the datanode can serve it.

-Todd



>
> --
> ******************************************
> Harold Dwight Valdivia Garcia
> Graduate Student
> M.S Computer Engineering
> University of Puerto Rico, Mayaguez Campus
> ******************************************
>

Reply via email to