Sorry in advance if I derail this, but I'm not sure what it would take
to actually implement such an operation. The initial pushback might just
be "use the block locations and assign the tablet yourself", since
that's essentially what HBase does (not suggesting there isn't something
better to do, just a hunch).
IMO, we don't have a lot of information on locality presently. I was
thinking it would be nice to create a tool to help us understand
locality at all.
My guess is that after this, our next big gain would be choosing a
better candidate for where to move a tablet in the case of rebalancing,
splits and previous-server failure (pretty much all of the times that we
aren't/can't put the tablet back to its previous loc). I'm not sure how
far this would get us combined with the favored nodes API, e.g. a Tablet
has some favored datanodes which we include in the HDFS calls and we can
try to put the tablet on one of those nodes and assume that HDFS will
have the blocks there.
tl;dr I'd want to have examples of how that the current API is
insufficient before lobbying for new HDFS APIs.
Keith Turner wrote:
I just thought of one potential issue with this. The same file can be
shared by multiple tablets on different tservers. If there are more than
3 tablets sharing a file, it could cause problems if all of them request a
local replica. So if hdfs had this operation, Accumulo would have to be
careful about which files it requested local blocks for.
On Tue, Jun 30, 2015 at 11:00 AM, Keith Turner<[email protected]> wrote:
There was a discussion on IRC about balancing and locality yesterday. I
was thinking about the locallity problem, and started thinking about the
possibility of having a HDFS operation that would force a file to have
local replicas. I think approach this has the following pros over forcing a
compaction.
* Only one replica is copied across the network.
* Avoids decompressing, deserializing, serializing, and compressing data.
The tricky part about this approach is that Accumulo needs to decide when
to ask HDFS to make a file local. This decision could be based on a
function of the file size and number of recent accesses.
We could avoid decompressing, deserializing, etc today by just copying
(not compacting) frequently accessed files. However this would write 3
replicas where a HDFS operation would only write one.
Note for the assertion that only one replica would need to be copied I was
thinking of following 3 initial conditions. I am assuming we want to avoid
all three replicas on same rack.
* Zero replicas on rack : can copy replica to node and drop replica on
another rack.
* One replica on rack : can copy replica to node and drop any other
replica.
* Two replicas on rack : can copy replica to node and drop another
replica on same rack.