[jira] [Created] (HBASE-8473) add note to ref guide about snapshots and ec2 reverse dns requirements.

Jonathan Hsieh (JIRA) Tue, 30 Apr 2013 18:17:16 -0700

Jonathan Hsieh created HBASE-8473:
-------------------------------------

             Summary: add note to ref guide about snapshots and ec2 reverse dns 
requirements.
                 Key: HBASE-8473
                 URL: https://issues.apache.org/jira/browse/HBASE-8473
             Project: HBase
          Issue Type: Bug
          Components: documentation, snapshots
    Affects Versions: 0.95.0, 0.94.6.1, 0.98.0
            Reporter: Jonathan Hsieh
            Assignee: Jonathan Hsieh

>From IRC from mighty Jeremy Carroll.

{code}
17:10 <jeremy_carroll> jmhsieh: I think I found the root cuase. All my region
servers reach the barrier, but it does not continue.
17:11 <jeremy_carroll> jmhsieh: All RS have this in their logs: 2013-05-01
00:04:56,356 DEBUG org.apache.hadoop.hbase.procedure.Subprocedure: Subprocedure
'backup1' coordinator notified of 'acquire', waiting on 'reached' or 'abort'
from coordinator.
17:11 <jeremy_carroll> jmhsieh: Then the coordinator (Master) never sends
anything. They just sit until the timeout.
17:12 <jeremy_carroll> jmhsieh: So basically 'reached' is never obtained. Then
abort it set, and it fails.
...
17:24 <jeremy_carroll> jmhsieh: Found the bug. The hostnames dont match the
master due to DNS resolution
17:25 <jeremy_carroll> jmhsieh: The barrier aquired is putting in the local
hostname from the regionservers. In EC2 (Where reverse DNS does not work well),
the master hands the internal name to the client.
17:25 <jeremy_carroll> jmhsieh:
https://s3.amazonaws.com/uploads.hipchat.com/23947/185789/au94meik0h3y5ii/Screen%20Shot%202013-04-30%20at%2017.25.50.png

17:26 <jeremy_carroll> jmhsieh: So it's waiting for something like
'ip-10-155-208-202.ec2.internal,60020,1367366580066' zNode to show up, but
instead 'hbasemetaclustera-d1b0a484,60020,1367366580066,' is being inserted.
Barrier is not reached
17:27 <jeremy_carroll> jmhsieh: Reason being in our environment the master does
not have a reverse DNS entry. So we get stuff like this on RegionServer startup
in our logs.
17:27 <jeremy_carroll> jmhsieh: 2013-05-01 00:03:00,614 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Master passed us hostname
to use. Was=hbasemetaclustera-d1b0a484, Now=ip-10-155-208-202.ec2.internal
17:54 <jeremy_carroll> jmhsieh: That was it. Verified. Now that Reverse DNS is
working, snapshots are working. Now how to figure out how to get Reverse DNS
working on Route53. I wished there was something like 'slave.host.name' inside
of Hadoop for this. Looking at source code.
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-8473) add note to ref guide about snapshots and ec2 reverse dns requirements.

Reply via email to