[ 
https://issues.apache.org/jira/browse/HBASE-8473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Misty Stanley-Jones reassigned HBASE-8473:
------------------------------------------

    Assignee: Misty Stanley-Jones  (was: Jonathan Hsieh)

> add note to ref guide about snapshots and ec2 reverse dns requirements.
> -----------------------------------------------------------------------
>
>                 Key: HBASE-8473
>                 URL: https://issues.apache.org/jira/browse/HBASE-8473
>             Project: HBase
>          Issue Type: Bug
>          Components: documentation, snapshots
>    Affects Versions: 0.98.0, 0.94.6.1, 0.95.0
>            Reporter: Jonathan Hsieh
>            Assignee: Misty Stanley-Jones
>
> From IRC from mighty Jeremy Carroll.
> {code}
> 17:10 <jeremy_carroll> jmhsieh: I think I found the root cuase. All my region 
> servers reach the barrier, but it does not continue.
> 17:11 <jeremy_carroll> jmhsieh: All RS have this in their logs: 2013-05-01 
> 00:04:56,356 DEBUG org.apache.hadoop.hbase.procedure.Subprocedure: 
> Subprocedure 'backup1' coordinator notified of 'acquire', waiting on 
> 'reached' or 'abort' from coordinator.
> 17:11 <jeremy_carroll> jmhsieh: Then the coordinator (Master) never sends 
> anything. They just sit until the timeout.
> 17:12 <jeremy_carroll> jmhsieh: So basically 'reached' is never obtained. 
> Then abort it set, and it fails.
> ...
> 17:24 <jeremy_carroll> jmhsieh: Found the bug. The hostnames dont match the 
> master due to DNS resolution
> 17:25 <jeremy_carroll> jmhsieh: The barrier aquired is putting in the local 
> hostname from the regionservers. In EC2 (Where reverse DNS does not work 
> well), the master hands the internal name to the client.
> 17:25 <jeremy_carroll> jmhsieh: 
> https://s3.amazonaws.com/uploads.hipchat.com/23947/185789/au94meik0h3y5ii/Screen%20Shot%202013-04-30%20at%2017.25.50.png
>  
> 17:26 <jeremy_carroll> jmhsieh: So it's waiting for something like 
> 'ip-10-155-208-202.ec2.internal,60020,1367366580066' zNode to show up, but 
> instead 'hbasemetaclustera-d1b0a484,60020,1367366580066,' is being inserted. 
> Barrier is not reached
> 17:27 <jeremy_carroll> jmhsieh: Reason being in our environment the master 
> does not have a reverse DNS entry. So we get stuff like this on RegionServer 
> startup in our logs.
> 17:27 <jeremy_carroll> jmhsieh: 2013-05-01 00:03:00,614 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: Master passed us hostname 
> to use. Was=hbasemetaclustera-d1b0a484, Now=ip-10-155-208-202.ec2.internal
> 17:54 <jeremy_carroll> jmhsieh: That was it. Verified. Now that Reverse DNS 
> is working, snapshots are working. Now how to figure out how to get Reverse 
> DNS working on Route53. I wished there was something like 'slave.host.name' 
> inside of Hadoop for this. Looking at source code.
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to