Hi there, I was running a mapreduce job (via Hive) against HBase, and noticed that I wasn't getting any locality (the input split location and the task tracker machine in the job tracker UI were always different, and "Rack-local map tasks" in the job counters was 0).
I tracked this down to a discrepancy in the way hostnames were being compared. The task tracker detail had a Host like /f/s/1.2.3.4/h.s.f.com. (with trailing dot) But the Input Split Location had /f/s/1.2.3.4/h.s.f.com (without trailing dot) I tweaked the Hive storage handler to add a dot to the end of the TableSplit.getLocation() returned by HBase's TableInputFormatBase, and that way I was successfully able to achieve locality after retesting. So I am guessing this is probably due to something anomalous with my test cluster's configuration, but in case other people are hitting the same thing, it could be addressed by either (a) making the Hadoop locality code use a hostname comparison which is insensitive to the presence of the trailing dot or (b) making the HBase split's hostname consistent with the task tracker Any opinions? JVS