> Keep in mind there's a fair bit of subtlety to it -- eg what happens > if you have two racks: A with 2 replicas, and B with one replica. A > node in rack A requests a local replica. In this case we have to make > sure that we move one of the A replicas and not the B replica (ie we > must respect the NN's rack replication policy).
Yes, good point. Also, I wonder how HDFS handles what will be over replication of the file (meaning will it try to delete the over replicated blocks, in which case we'd need to ensure [somehow] this doesn't happen). On Thu, May 26, 2011 at 12:30 PM, Todd Lipcon <t...@cloudera.com> wrote: > On Thu, May 26, 2011 at 12:02 PM, Jason Rutherglen > <jason.rutherg...@gmail.com> wrote: >> Todd, thanks! >> >>> In general, though, keep in mind that, whenever you write data, you'll >>> get a local copy first, if the writer is in the cluster. That's how >>> HBase gets locality for most of its accesses >> >> Right. However in the failover scenario where a node goes down >> (hardware failure, or either of the processes, such as the DataNode, >> RegionServer, etc), then I think the new RS will not have local data? >> We could first make a request that all necessary HDFS files go local >> prior to the new RS being available. At least for search to work this >> is a requirement. > > Yep, we've thrown this idea around before in the past, but not sure if > there's an HBASE JIRA for it or not. > >> >>> There are some non-public APIs to do this -- have a look at how the >>> Balancer works - the dispatch() function is the guts you're looking >>> for. It might be nice to expose this functionality as a "limited >>> private evolving" API >> >> Perhaps simply mark them as 'expert' or make them package private? >> I'll work on a patch. > > Sounds good. > > Keep in mind there's a fair bit of subtlety to it -- eg what happens > if you have two racks: A with 2 replicas, and B with one replica. A > node in rack A requests a local replica. In this case we have to make > sure that we move one of the A replicas and not the B replica (ie we > must respect the NN's rack replication policy). > > -Todd > >> On Thu, May 26, 2011 at 11:40 AM, Todd Lipcon <t...@cloudera.com> wrote: >>> Hey Jason, >>> >>> There are some non-public APIs to do this -- have a look at how the >>> Balancer works - the dispatch() function is the guts you're looking >>> for. It might be nice to expose this functionality as a "limited >>> private evolving" API. >>> >>> In general, though, keep in mind that, whenever you write data, you'll >>> get a local copy first, if the writer is in the cluster. That's how >>> HBase gets locality for most of its accesses. >>> >>> -Todd >>> >>> On Thu, May 26, 2011 at 11:36 AM, Jason Rutherglen >>> <jason.rutherg...@gmail.com> wrote: >>>> Is there a way to send a request to the name node to replicate >>>> block(s) to a specific DataNode? If not, what would be a way to do >>>> this? -Thanks >>>> >>> >>> >>> >>> -- >>> Todd Lipcon >>> Software Engineer, Cloudera >>> >> > > > > -- > Todd Lipcon > Software Engineer, Cloudera >