[
https://issues.apache.org/jira/browse/HBASE-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479361#comment-13479361
]
stack commented on HBASE-5974:
------------------------------
Like Lars and Todd above I almost said why RegionScannerHolder? Is it because
if you add this class, you can do the seqid management in the regionserver and
CPs do not have to be concerned with its management? That is fair enough and
this is pretty elegant soln to the issue.
Your worry is that a CP might return its own RegionScanner implementation. Can
you not add to RegionScanner to methods, getNextSeq, and incrementNextSeq?
Then the regionserver would have the means for keeping up the seqid in whatever
the implementation rather than introduce this wrapper class? RegionScanner is
marked as private. For the trunk patch, we are requiring a restart and
allowing that CPs may need modification to make the leap from 0.94 to 0.96.
Does that make it easier on you modifying RegionScanner to support nextSeq?
Yeah, I think that rather than:
{code}
+ optional uint64 callSeq = 6;
{code}
it should be called nextSeq or nextCallSeq (the latter might be better because
nextSeq might make you think the next sequence id rather than the sequence id
for the 'next' call). All references should be changed if you change this I"d
say Anoop.
This test, TestClientScannerRPCTimeout, looks good. Does it have to be in a
separate class? Do we not have a scanner test already that has a running
minicluster? I can understand though if you want to keep this separate because
its hard to integrate into a current test suite. If you are going to have a
class for this single test and are going to go to the bother of spinning up a
whole minicluster, I'd say do a bit more testing. Are there other scenarios
you can manufacture? Can you add assert that you actually slept?
Good stuff Anoop. Lets get this important patch in.
> Scanner retry behavior with RPC timeout on next() seems incorrect
> -----------------------------------------------------------------
>
> Key: HBASE-5974
> URL: https://issues.apache.org/jira/browse/HBASE-5974
> Project: HBase
> Issue Type: Bug
> Components: Client, regionserver
> Affects Versions: 0.90.7, 0.92.1, 0.94.0, 0.96.0
> Reporter: Todd Lipcon
> Assignee: Anoop Sam John
> Priority: Critical
> Fix For: 0.96.0
>
> Attachments: 5974_94-V4.patch, 5974_trunk.patch, 5974_trunk-V2.patch,
> HBASE-5974_0.94.patch, HBASE-5974_94-V2.patch, HBASE-5974_94-V3.patch
>
>
> I'm seeing the following behavior:
> - set RPC timeout to a short value
> - call next() for some batch of rows, big enough so the client times out
> before the result is returned
> - the HConnectionManager stuff will retry the next() call to the same server.
> At this point, one of two things can happen: 1) the previous next() call will
> still be processing, in which case you get a LeaseException, because it was
> removed from the map during the processing, or 2) the next() call will
> succeed but skip the prior batch of rows.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira