elton sky wrote:
Steve,

Seems HP has done block based parallel reading from different datanodes.

yes; very much like IBM's GPFS, only with JBOD storage and the option of running code near the data when appropriate.

Though not from disk level, they achieve 4Gb/s rate with 9 readers (500Mb/s
each).
I didn't see anywhere I can download their code to play around, pity~


I do have access to that code if I can get at the right bit of the repository, if you really want me to look at it in detail ask, with the caveats that I'm away for the rest of the month and somewhat busy. Apart from that there's no reason why I shouldn't be able to make the changes to DfsClient public. Keep reminding me :)


BTW, can we specify which disk to read from with Java?


I think right now you get a list of blocks via DfsClient.getBlockLocations(); this is a list of hosts where blocks live. There is no data about which disk on the specific host.

I belive that what Russ did was move the decisions from DfsInputStream -which picks a block location for you, with a bias to the local host- and instead lets the calling program make the decision as to where to fetch each block. This meant he could set the renderer up to request blocks from different hosts.

He had tried to use the JT to schedule the rendering code, but that didn't work as MapReduce has the notion of "reduction": less data out than in, so it moves work to where the data is. In rendering it's more MapExpand; the operation is the transformation of PDF pages into 600dpi 32bpp bitmaps, which then need to be streamed to the (very large) printer at its print rate, in the correct order. It was easiest to have a specific machine on the cluster -with no datanodes or TTs- set up to do the rendering, and just ask the filesystem for where things are.

Like I said, I don't think there was anything tricky done in DfsClient, more a matter of making some data known internally to the DfsClient code public, so that the client app can decide where to fetch data. If the DfsClient knew which HDD the data was on in a datanode, the client app could use that in its decision making too, so that if the 9 machines each had 6 HDDs, you could keep them all busy.

Reply via email to