On Thu, May 26, 2011 at 12:02 PM, Jason Rutherglen <jason.rutherg...@gmail.com> wrote: > Todd, thanks! > >> In general, though, keep in mind that, whenever you write data, you'll >> get a local copy first, if the writer is in the cluster. That's how >> HBase gets locality for most of its accesses > > Right. However in the failover scenario where a node goes down > (hardware failure, or either of the processes, such as the DataNode, > RegionServer, etc), then I think the new RS will not have local data? > We could first make a request that all necessary HDFS files go local > prior to the new RS being available. At least for search to work this > is a requirement.
Yep, we've thrown this idea around before in the past, but not sure if there's an HBASE JIRA for it or not. > >> There are some non-public APIs to do this -- have a look at how the >> Balancer works - the dispatch() function is the guts you're looking >> for. It might be nice to expose this functionality as a "limited >> private evolving" API > > Perhaps simply mark them as 'expert' or make them package private? > I'll work on a patch. Sounds good. Keep in mind there's a fair bit of subtlety to it -- eg what happens if you have two racks: A with 2 replicas, and B with one replica. A node in rack A requests a local replica. In this case we have to make sure that we move one of the A replicas and not the B replica (ie we must respect the NN's rack replication policy). -Todd > On Thu, May 26, 2011 at 11:40 AM, Todd Lipcon <t...@cloudera.com> wrote: >> Hey Jason, >> >> There are some non-public APIs to do this -- have a look at how the >> Balancer works - the dispatch() function is the guts you're looking >> for. It might be nice to expose this functionality as a "limited >> private evolving" API. >> >> In general, though, keep in mind that, whenever you write data, you'll >> get a local copy first, if the writer is in the cluster. That's how >> HBase gets locality for most of its accesses. >> >> -Todd >> >> On Thu, May 26, 2011 at 11:36 AM, Jason Rutherglen >> <jason.rutherg...@gmail.com> wrote: >>> Is there a way to send a request to the name node to replicate >>> block(s) to a specific DataNode? If not, what would be a way to do >>> this? -Thanks >>> >> >> >> >> -- >> Todd Lipcon >> Software Engineer, Cloudera >> > -- Todd Lipcon Software Engineer, Cloudera