Hi there,

I was running a mapreduce job (via Hive) against HBase, and noticed that I 
wasn't getting any locality (the input split location and the task tracker 
machine in the job tracker UI were always different, and "Rack-local map tasks" 
in the job counters was 0).

I tracked this down to a discrepancy in the way hostnames were being compared.

The task tracker detail had a Host like

/f/s/1.2.3.4/h.s.f.com.

(with trailing dot)

But the Input Split Location had

/f/s/1.2.3.4/h.s.f.com

(without trailing dot)

I tweaked the Hive storage handler to add a dot to the end of the 
TableSplit.getLocation() returned by HBase's TableInputFormatBase, and that way 
I was successfully able to achieve locality after retesting.

So I am guessing this is probably due to something anomalous with my test 
cluster's configuration, but in case other people are hitting the same thing, 
it could be addressed by either

(a) making the Hadoop locality code use a hostname comparison which is 
insensitive to the presence of the trailing dot

or

(b) making the HBase split's hostname consistent with the task tracker

Any opinions?

JVS

Reply via email to