Matteo Bertozzi created HBASE-8783:
--------------------------------------

             Summary: RSSnapshotManager.ZKProcedureMemberRpcs may be 
initialized with the wrong server name
                 Key: HBASE-8783
                 URL: https://issues.apache.org/jira/browse/HBASE-8783
             Project: HBase
          Issue Type: Bug
          Components: snapshots
    Affects Versions: 0.95.1, 0.94.8
            Reporter: Matteo Bertozzi
            Assignee: Matteo Bertozzi
            Priority: Minor
             Fix For: 0.95.2, 0.94.9
         Attachments: HBASE-8783-0.94-v0.patch

The ZKProcedureMemberRpcs of the RegionServerSnapshotManager may be initialized 
with the wrong memberName.

{code}
2013-06-21 05:03:41,732 DEBUG 
org.apache.hadoop.hbase.regionserver.HRegionServer: Initialize Snapshot Manager
...
2013-06-21 05:03:41,875 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Master passed us hostname 
to use. Was=0.0.0.0, Now=srv-5.test.cloudera.com
{code}

The Region Server Name is used as memberName, but since the snapshot manger is 
initialized before the RS receives the server name used by the master, the 
zkprocedure will use the wrong name (0.0.0.0). 
This will case the snapshot to fail with a TimeoutException since the master 
will not receive the expected RS
{code}
Master:
ZKProcedureCoordinatorRpcs: Watching for acquire 
node:/hbase/online-snapshot/acquired/foo23/srv-5.test.cloudera.com,60020,1371813451915

RS:
ZKProcedureMemberRpcs: Member: '0.0.0.0,60020,1371814996779' joining acquired 
barrier for procedure (foo23) in zk

...
org.apache.hadoop.hbase.errorhandling.TimeoutException: Timeout elapsed! 
Source:Timeout caused Foreign Exception Start:1371798732141, End:1371798792141, 
diff:60000, max:60000 ms
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to