Hi,

I'm trying to figure out how data is transferred between client and
DataNode in Hadoop v1.8.

This is my understanding so far:

The client first fires an OP_READ_BLOCK request. The DataNode responds with
a status code, checksum header, chunk offset, packet length, sequence
number, the last packet boolean, the length and the data (in that order).

However, I'm running into an issue. First of all, which of these lengths
describes the length of the data? I tried both PacketLength and Length it
seems that they leave data on the stream (I tried to "cat" a file with the
numbers 1-1000 in it).

Also, how does the DataNode signal the start of another packet? After
"Length" number of bytes have been read, I assumed that the header would be
repeated, but this is not the case (I'm not getting sane values for any of
the fields of the header).

I've looked through the DataXceiver, BlockSender, DFSClient
(RemoteBlockReader) classes but I still can't quite grasp how this data
transfer is conducted.

Any help would be appreciated,

Dhaivat Pandya

Reply via email to