Jonathan Hsieh created HBASE-8473:
-------------------------------------

             Summary: add note to ref guide about snapshots and ec2 reverse dns 
requirements.
                 Key: HBASE-8473
                 URL: https://issues.apache.org/jira/browse/HBASE-8473
             Project: HBase
          Issue Type: Bug
          Components: documentation, snapshots
    Affects Versions: 0.95.0, 0.94.6.1, 0.98.0
            Reporter: Jonathan Hsieh
            Assignee: Jonathan Hsieh


>From IRC from mighty Jeremy Carroll.

{code}
17:10 <jeremy_carroll> jmhsieh: I think I found the root cuase. All my region 
servers reach the barrier, but it does not continue.
17:11 <jeremy_carroll> jmhsieh: All RS have this in their logs: 2013-05-01 
00:04:56,356 DEBUG org.apache.hadoop.hbase.procedure.Subprocedure: Subprocedure 
'backup1' coordinator notified of 'acquire', waiting on 'reached' or 'abort' 
from coordinator.
17:11 <jeremy_carroll> jmhsieh: Then the coordinator (Master) never sends 
anything. They just sit until the timeout.
17:12 <jeremy_carroll> jmhsieh: So basically 'reached' is never obtained. Then 
abort it set, and it fails.
...
17:24 <jeremy_carroll> jmhsieh: Found the bug. The hostnames dont match the 
master due to DNS resolution
17:25 <jeremy_carroll> jmhsieh: The barrier aquired is putting in the local 
hostname from the regionservers. In EC2 (Where reverse DNS does not work well), 
the master hands the internal name to the client.
17:25 <jeremy_carroll> jmhsieh: 
https://s3.amazonaws.com/uploads.hipchat.com/23947/185789/au94meik0h3y5ii/Screen%20Shot%202013-04-30%20at%2017.25.50.png
 
17:26 <jeremy_carroll> jmhsieh: So it's waiting for something like 
'ip-10-155-208-202.ec2.internal,60020,1367366580066' zNode to show up, but 
instead 'hbasemetaclustera-d1b0a484,60020,1367366580066,' is being inserted. 
Barrier is not reached
17:27 <jeremy_carroll> jmhsieh: Reason being in our environment the master does 
not have a reverse DNS entry. So we get stuff like this on RegionServer startup 
in our logs.
17:27 <jeremy_carroll> jmhsieh: 2013-05-01 00:03:00,614 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Master passed us hostname 
to use. Was=hbasemetaclustera-d1b0a484, Now=ip-10-155-208-202.ec2.internal
17:54 <jeremy_carroll> jmhsieh: That was it. Verified. Now that Reverse DNS is 
working, snapshots are working. Now how to figure out how to get Reverse DNS 
working on Route53. I wished there was something like 'slave.host.name' inside 
of Hadoop for this. Looking at source code.
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to