elton sky wrote:
Steve,
Seems HP has done block based parallel reading from different datanodes.
yes; very much like IBM's GPFS, only with JBOD storage and the option of
running code near the data when appropriate.
Though not from disk level, they achieve 4Gb/s rate with 9 readers (500Mb/s
each).
I didn't see anywhere I can download their code to play around, pity~
I do have access to that code if I can get at the right bit of the
repository, if you really want me to look at it in detail ask, with the
caveats that I'm away for the rest of the month and somewhat busy. Apart
from that there's no reason why I shouldn't be able to make the changes
to DfsClient public. Keep reminding me :)
BTW, can we specify which disk to read from with Java?
I think right now you get a list of blocks via
DfsClient.getBlockLocations(); this is a list of hosts where blocks
live. There is no data about which disk on the specific host.
I belive that what Russ did was move the decisions from DfsInputStream
-which picks a block location for you, with a bias to the local host-
and instead lets the calling program make the decision as to where to
fetch each block. This meant he could set the renderer up to request
blocks from different hosts.
He had tried to use the JT to schedule the rendering code, but that
didn't work as MapReduce has the notion of "reduction": less data out
than in, so it moves work to where the data is. In rendering it's more
MapExpand; the operation is the transformation of PDF pages into 600dpi
32bpp bitmaps, which then need to be streamed to the (very large)
printer at its print rate, in the correct order. It was easiest to have
a specific machine on the cluster -with no datanodes or TTs- set up to
do the rendering, and just ask the filesystem for where things are.
Like I said, I don't think there was anything tricky done in DfsClient,
more a matter of making some data known internally to the DfsClient
code public, so that the client app can decide where to fetch data. If
the DfsClient knew which HDD the data was on in a datanode, the client
app could use that in its decision making too, so that if the 9 machines
each had 6 HDDs, you could keep them all busy.