Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
---------------------------------------------------------------------------
Key: HBASE-5259
URL: https://issues.apache.org/jira/browse/HBASE-5259
Project: HBase
Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
Assuming the HBase and MapReduce running in the same cluster, the
TableInputFormat is to override the split function which divides all the
regions from one particular table into a series of mapper tasks. So each mapper
task can process a region or one part of a region. Ideally, the mapper task
should run on the same machine on which the region server hosts the
corresponding region. That's the motivation that the TableInputFormat sets the
RegionLocation so that the MapReduce framework can respect the node locality.
The code simply set the host name of the region server as the HRegionLocation.
However, the host name of the region server may have different format with the
host name of the task tracker (Mapper task). The task tracker always gets its
hostname by the reverse DNS lookup. And the DNS service may return different
host name format. For example, the host name of the region server is correctly
set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an
additional doc in the end).
So the solution is to set the RegionLocation by the reverse DNS lookup as well.
No matter what host name format the DNS system is using, the TableInputFormat
has the responsibility to keep the consistent host name format with the
MapReduce framework.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira