[ 
https://issues.apache.org/jira/browse/HBASE-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479361#comment-13479361
 ] 

stack commented on HBASE-5974:
------------------------------

Like Lars and Todd above I almost said why RegionScannerHolder?  Is it because 
if you add this class, you can do the seqid management in the regionserver and 
CPs do not have to be concerned with its management?  That is fair enough and 
this is pretty elegant soln to the issue.

Your worry is that a CP might return its own RegionScanner implementation.  Can 
you not add to RegionScanner to methods, getNextSeq, and incrementNextSeq?  
Then the regionserver would have the means for keeping up the seqid in whatever 
the implementation rather than introduce this wrapper class?  RegionScanner is 
marked as private.  For the trunk patch, we are requiring a restart and 
allowing that CPs may need modification to make the leap from 0.94 to 0.96.  
Does that make it easier on you modifying RegionScanner to support nextSeq?

Yeah, I think that rather than:

{code}
+  optional uint64 callSeq = 6;
{code}

it should be called nextSeq or nextCallSeq (the latter might be better because 
nextSeq might make you think the next sequence id rather than the sequence id 
for the 'next' call).  All references should be changed if you change this I"d 
say Anoop.

This test, TestClientScannerRPCTimeout, looks good.  Does it have to be in a 
separate class?  Do we not have a scanner test already that has a running 
minicluster?  I can understand though if you want to keep this separate because 
its hard to integrate into a current test suite.  If you are going to have a 
class for this single test and are going to go to the bother of spinning up a 
whole minicluster, I'd say do a bit more testing.  Are there other scenarios 
you can manufacture?  Can you add assert that you actually slept?

Good stuff Anoop.  Lets get this important patch in.




                
> Scanner retry behavior with RPC timeout on next() seems incorrect
> -----------------------------------------------------------------
>
>                 Key: HBASE-5974
>                 URL: https://issues.apache.org/jira/browse/HBASE-5974
>             Project: HBase
>          Issue Type: Bug
>          Components: Client, regionserver
>    Affects Versions: 0.90.7, 0.92.1, 0.94.0, 0.96.0
>            Reporter: Todd Lipcon
>            Assignee: Anoop Sam John
>            Priority: Critical
>             Fix For: 0.96.0
>
>         Attachments: 5974_94-V4.patch, 5974_trunk.patch, 5974_trunk-V2.patch, 
> HBASE-5974_0.94.patch, HBASE-5974_94-V2.patch, HBASE-5974_94-V3.patch
>
>
> I'm seeing the following behavior:
> - set RPC timeout to a short value
> - call next() for some batch of rows, big enough so the client times out 
> before the result is returned
> - the HConnectionManager stuff will retry the next() call to the same server. 
> At this point, one of two things can happen: 1) the previous next() call will 
> still be processing, in which case you get a LeaseException, because it was 
> removed from the map during the processing, or 2) the next() call will 
> succeed but skip the prior batch of rows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to