On Tue, Jun 30, 2015 at 11:39 AM, Josh Elser <[email protected]> wrote:
> Sorry in advance if I derail this, but I'm not sure what it would take to > actually implement such an operation. The initial pushback might just be > "use the block locations and assign the tablet yourself", since that's > essentially what HBase does (not suggesting there isn't something better to > do, just a hunch). > > IMO, we don't have a lot of information on locality presently. I was > thinking it would be nice to create a tool to help us understand locality > at all. > > My guess is that after this, our next big gain would be choosing a better > candidate for where to move a tablet in the case of rebalancing, splits and > previous-server failure (pretty much all of the times that we aren't/can't > put the tablet back to its previous loc). I'm not sure how far this would > get us combined with the favored nodes API, e.g. a Tablet has some favored > datanodes which we include in the HDFS calls and we can try to put the > tablet on one of those nodes and assume that HDFS will have the blocks > there. > > tl;dr I'd want to have examples of how that the current API is > insufficient before lobbying for new HDFS APIs. Having some examples of how the status quo is insufficient is a good idea. I was trying to think of situations where there are no suitable nodes that have *all* of a tablets file blocks local. In these situations the best we can hope for is a node that has the largest subset of a tablets file blocks. I think the following scenarios can cause this situation where there is no node that has all tablet file blocks. * Added X new tablet servers. Tablets moved inorder to evenly spread tablets. * A lot of tablets in a table just split. Inorder to evenly spread tablets across cluster, need to move them. * Decommissioned X tablet servers. Tablets moved inorder to evenly spread tablets. * A tablets has been hosted on multiple tablet servers and as a result there is no single datanode that has all of its file blocks. * Tablet servers run on a subset of the datanodes. Is the ratio of tservers to datanodes goes lower the ability to find a datanode with many of a tablets file blocks goes down. * Decommissioning or adding datanodes could also throw off a tablets locality. Are there other cases I am missing? > > > Keith Turner wrote: > >> I just thought of one potential issue with this. The same file can be >> shared by multiple tablets on different tservers. If there are more than >> 3 tablets sharing a file, it could cause problems if all of them request a >> local replica. So if hdfs had this operation, Accumulo would have to be >> careful about which files it requested local blocks for. >> >> On Tue, Jun 30, 2015 at 11:00 AM, Keith Turner<[email protected]> wrote: >> >> There was a discussion on IRC about balancing and locality yesterday. I >>> was thinking about the locallity problem, and started thinking about the >>> possibility of having a HDFS operation that would force a file to have >>> local replicas. I think approach this has the following pros over >>> forcing a >>> compaction. >>> >>> * Only one replica is copied across the network. >>> * Avoids decompressing, deserializing, serializing, and compressing >>> data. >>> >>> The tricky part about this approach is that Accumulo needs to decide when >>> to ask HDFS to make a file local. This decision could be based on a >>> function of the file size and number of recent accesses. >>> >>> We could avoid decompressing, deserializing, etc today by just copying >>> (not compacting) frequently accessed files. However this would write 3 >>> replicas where a HDFS operation would only write one. >>> >>> Note for the assertion that only one replica would need to be copied I >>> was >>> thinking of following 3 initial conditions. I am assuming we want to >>> avoid >>> all three replicas on same rack. >>> >>> * Zero replicas on rack : can copy replica to node and drop replica on >>> another rack. >>> * One replica on rack : can copy replica to node and drop any other >>> replica. >>> * Two replicas on rack : can copy replica to node and drop another >>> replica on same rack. >>> >>> >>> >>> >>
