[
https://issues.apache.org/jira/browse/HBASE-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13690425#comment-13690425
]
Matteo Bertozzi commented on HBASE-8783:
----------------------------------------
{quote}ZKProcedureUtil does not need memberName? Was just a nice-to-have?{quote}
the class operates only on the procedure and not on the member.
ZKProcedureMemberRpcs (the user of ZKProcedureUtil) should know the memberName
(coordName).
I think that both class names are quite explanatory, my guess is that was
something not cleaned up.
> RSSnapshotManager.ZKProcedureMemberRpcs may be initialized with the wrong
> server name
> -------------------------------------------------------------------------------------
>
> Key: HBASE-8783
> URL: https://issues.apache.org/jira/browse/HBASE-8783
> Project: HBase
> Issue Type: Bug
> Components: snapshots
> Affects Versions: 0.94.8, 0.95.1
> Reporter: Matteo Bertozzi
> Assignee: Matteo Bertozzi
> Priority: Minor
> Fix For: 0.95.2, 0.94.9
>
> Attachments: HBASE-8783-0.94-v0.patch, HBASE-8783-v0.patch
>
>
> The ZKProcedureMemberRpcs of the RegionServerSnapshotManager may be
> initialized with the wrong memberName.
> {code}
> 2013-06-21 05:03:41,732 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegionServer: Initialize Snapshot
> Manager
> ...
> 2013-06-21 05:03:41,875 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Master passed us hostname
> to use. Was=0.0.0.0, Now=srv-5.test.cloudera.com
> {code}
> The Region Server Name is used as memberName, but since the snapshot manger
> is initialized before the RS receives the server name used by the master, the
> zkprocedure will use the wrong name (0.0.0.0).
> This will case the snapshot to fail with a TimeoutException since the master
> will not receive the expected RS
> {code}
> Master:
> ZKProcedureCoordinatorRpcs: Watching for acquire
> node:/hbase/online-snapshot/acquired/foo23/srv-5.test.cloudera.com,60020,1371813451915
> RS:
> ZKProcedureMemberRpcs: Member: '0.0.0.0,60020,1371814996779' joining acquired
> barrier for procedure (foo23) in zk
> ...
> org.apache.hadoop.hbase.errorhandling.TimeoutException: Timeout elapsed!
> Source:Timeout caused Foreign Exception Start:1371798732141,
> End:1371798792141, diff:60000, max:60000 ms
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira