[
https://issues.apache.org/jira/browse/HBASE-8473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060326#comment-14060326
]
stack commented on HBASE-8473:
------------------------------
Patch is good. I could apply. We mention reverse dns as an old requirement at
"2.1.2.2. DNS". Should this section link to it? Might be ok if it didn't? I
can just commit. Troubleshooting is a good place for this info at least for
starters. Let me just commit.
> add note to ref guide about snapshots and ec2 reverse dns requirements.
> -----------------------------------------------------------------------
>
> Key: HBASE-8473
> URL: https://issues.apache.org/jira/browse/HBASE-8473
> Project: HBase
> Issue Type: Bug
> Components: documentation, snapshots
> Affects Versions: 0.98.0, 0.94.6.1, 0.95.0
> Reporter: Jonathan Hsieh
> Assignee: Misty Stanley-Jones
> Attachments: HBASE-8473.patch
>
>
> From IRC from mighty Jeremy Carroll.
> {code}
> 17:10 <jeremy_carroll> jmhsieh: I think I found the root cuase. All my region
> servers reach the barrier, but it does not continue.
> 17:11 <jeremy_carroll> jmhsieh: All RS have this in their logs: 2013-05-01
> 00:04:56,356 DEBUG org.apache.hadoop.hbase.procedure.Subprocedure:
> Subprocedure 'backup1' coordinator notified of 'acquire', waiting on
> 'reached' or 'abort' from coordinator.
> 17:11 <jeremy_carroll> jmhsieh: Then the coordinator (Master) never sends
> anything. They just sit until the timeout.
> 17:12 <jeremy_carroll> jmhsieh: So basically 'reached' is never obtained.
> Then abort it set, and it fails.
> ...
> 17:24 <jeremy_carroll> jmhsieh: Found the bug. The hostnames dont match the
> master due to DNS resolution
> 17:25 <jeremy_carroll> jmhsieh: The barrier aquired is putting in the local
> hostname from the regionservers. In EC2 (Where reverse DNS does not work
> well), the master hands the internal name to the client.
> 17:25 <jeremy_carroll> jmhsieh:
> https://s3.amazonaws.com/uploads.hipchat.com/23947/185789/au94meik0h3y5ii/Screen%20Shot%202013-04-30%20at%2017.25.50.png
>
> 17:26 <jeremy_carroll> jmhsieh: So it's waiting for something like
> 'ip-10-155-208-202.ec2.internal,60020,1367366580066' zNode to show up, but
> instead 'hbasemetaclustera-d1b0a484,60020,1367366580066,' is being inserted.
> Barrier is not reached
> 17:27 <jeremy_carroll> jmhsieh: Reason being in our environment the master
> does not have a reverse DNS entry. So we get stuff like this on RegionServer
> startup in our logs.
> 17:27 <jeremy_carroll> jmhsieh: 2013-05-01 00:03:00,614 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Master passed us hostname
> to use. Was=hbasemetaclustera-d1b0a484, Now=ip-10-155-208-202.ec2.internal
> 17:54 <jeremy_carroll> jmhsieh: That was it. Verified. Now that Reverse DNS
> is working, snapshots are working. Now how to figure out how to get Reverse
> DNS working on Route53. I wished there was something like 'slave.host.name'
> inside of Hadoop for this. Looking at source code.
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)