Victor, HDFS writes by default on local datanode first so it only requires a few compactions for a region to be completely hosted on the same machine as the region server serving it. Worst case is 24 hours, the time for a major compaction to happen.
J-D On Thu, Feb 18, 2010 at 7:36 PM, Victor Hsieh <victorhs...@gmail.com> wrote: > One tricky thing is that if the region size is larger (default max > size is 256MB) that HDFS block size (default 64MB), it's still > necessary to go through network. > > Victor > > On Thu, Feb 18, 2010 at 12:22 AM, Jean-Daniel Cryans > <jdcry...@apache.org> wrote: >> Bryan, >> >> What you are describing is already implemented and from my experience >90% >> of my tasks are usually run on the region server that has the mapped region. >> >> See o.a.h.h.mapreduce.TableSplit.getLocations() >> >> J-D >> >> On Wed, Feb 17, 2010 at 12:10 AM, Bryan McCormick <br...@readpath.com>wrote: >> >>> Quick question about data local vs rack local tasks when running map reduce >>> jobs against hbase. I've just run a job against a table that was split into >>> 1,645 tasks. Looking at the job page it's reporting that 1,445 of those jobs >>> were rack local compared to 200 that were data local. I'm taking these >>> counters to mean that most of the jobs were running on a server that wasn't >>> the same as the relevant region server. Is it possible or are there plans >>> to add some logic into the scheduler to prefer jobs to run on the same >>> server as the regionserver if possible? >>> >>> With HBase is there a similar way to tell if a region on a regionserver has >>> a copy of the files that it needs to serve the region on a local datanode >>> instead of having to cross the network to get it? >>> >>> I know that when you're writing new data into a table and it splits, the >>> default is to have the first datanode copy be local. But after a fairly >>> large table has been brought up and down several times with all of the >>> regions being reassigned, is there logic when assigning regions to put them >>> on a data local server? >>> >>> Thanks, >>> Bryan >>> >> >