[ 
https://issues.apache.org/jira/browse/HBASE-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13690425#comment-13690425
 ] 

Matteo Bertozzi commented on HBASE-8783:
----------------------------------------

{quote}ZKProcedureUtil does not need memberName? Was just a nice-to-have?{quote}
the class operates only on the procedure and not on the member.
ZKProcedureMemberRpcs (the user of ZKProcedureUtil) should know the memberName 
(coordName).
I think that both class names are quite explanatory, my guess is that was 
something not cleaned up.
                
> RSSnapshotManager.ZKProcedureMemberRpcs may be initialized with the wrong 
> server name
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-8783
>                 URL: https://issues.apache.org/jira/browse/HBASE-8783
>             Project: HBase
>          Issue Type: Bug
>          Components: snapshots
>    Affects Versions: 0.94.8, 0.95.1
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>            Priority: Minor
>             Fix For: 0.95.2, 0.94.9
>
>         Attachments: HBASE-8783-0.94-v0.patch, HBASE-8783-v0.patch
>
>
> The ZKProcedureMemberRpcs of the RegionServerSnapshotManager may be 
> initialized with the wrong memberName.
> {code}
> 2013-06-21 05:03:41,732 DEBUG 
> org.apache.hadoop.hbase.regionserver.HRegionServer: Initialize Snapshot 
> Manager
> ...
> 2013-06-21 05:03:41,875 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: Master passed us hostname 
> to use. Was=0.0.0.0, Now=srv-5.test.cloudera.com
> {code}
> The Region Server Name is used as memberName, but since the snapshot manger 
> is initialized before the RS receives the server name used by the master, the 
> zkprocedure will use the wrong name (0.0.0.0). 
> This will case the snapshot to fail with a TimeoutException since the master 
> will not receive the expected RS
> {code}
> Master:
> ZKProcedureCoordinatorRpcs: Watching for acquire 
> node:/hbase/online-snapshot/acquired/foo23/srv-5.test.cloudera.com,60020,1371813451915
> RS:
> ZKProcedureMemberRpcs: Member: '0.0.0.0,60020,1371814996779' joining acquired 
> barrier for procedure (foo23) in zk
> ...
> org.apache.hadoop.hbase.errorhandling.TimeoutException: Timeout elapsed! 
> Source:Timeout caused Foreign Exception Start:1371798732141, 
> End:1371798792141, diff:60000, max:60000 ms
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to