[ 
https://issues.apache.org/jira/browse/HBASE-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4462:
-------------------------

    Attachment: unittest_that_shows_us_retrying_sockettimeout.txt

Here is unit test that seems to show us retrying socketimeouts.
                
> Properly treating SocketTimeoutException
> ----------------------------------------
>
>                 Key: HBASE-4462
>                 URL: https://issues.apache.org/jira/browse/HBASE-4462
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: ramkrishna.s.vasudevan
>             Fix For: 0.90.8
>
>         Attachments: HBASE-4462_0.90.x.patch, 
> unittest_that_shows_us_retrying_sockettimeout.txt
>
>
> SocketTimeoutException is currently treated like any IOE inside of 
> HCM.getRegionServerWithRetries and I think this is a problem. This method 
> should only do retries in cases where we are pretty sure the operation will 
> complete, but with STE we already waited for (by default) 60 seconds and 
> nothing happened.
> I found this while debugging Douglas Campbell's problem on the mailing list 
> where it seemed like he was using the same scanner from multiple threads, but 
> actually it was just the same client doing retries while the first run didn't 
> even finish yet (that's another problem). You could see the first scanner, 
> then up to two other handlers waiting for it to finish in order to run 
> (because of the synchronization on RegionScanner).
> So what should we do? We could treat STE as a DoNotRetryException and let the 
> client deal with it, or we could retry only once.
> There's also the option of having a different behavior for get/put/icv/scan, 
> the issue with operations that modify a cell is that you don't know if the 
> operation completed or not (same when a RS dies hard after completing let's 
> say a Put but just before returning to the client).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to