[ 
https://issues.apache.org/jira/browse/OMID-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16650343#comment-16650343
 ] 

James Taylor commented on OMID-117:
-----------------------------------

{quote}About the retries, what the worst thing that can happen with this? how 
bad is it to have it like this?
{quote}
Check out the comment I added to RegionConnectionFactory:
{code:java}
// This setting controls how many retries occur on the region server if an
// IOException occurs while trying to access the commit table. Because a
// handler thread will be in use while these retries occur and the client
// will be blocked waiting, it must not tie up the call for longer than
// the client RPC timeout. Otherwise, the client will initiate retries on it's
// end, tying up yet another handler thread. It's best if the retries can be
// zero, as in that case the handler is released and the retries occur on the
// client side. In testing, we've seen NoServerForRegionException occur which
// is a DoNotRetryIOException which are not retried on the client. It's not
// clear if this is a real issue or a test-only issue.
private static final int DEFAULT_COMMIT_TABLE_ACCESS_ON_READ_RETRIES_NUMBER = 
11;
private static final int DEFAULT_COMMIT_TABLE_ACCESS_ON_READ_RETRY_PAUSE = 100;
{code}
As it is with this patch, if retries are necessary to reach the RS hosting the 
commit table, they will occur from the RS handling the scan for 48 seconds. 
During this time, the handler thread will be tied up (i.e. it won't be able to 
be used by any other HBase client). If this occurs for all the handler threads 
on the RS, then all incoming requests would be queued. For example, non 
transactional queries would potentially not be processed during this time. If 
the retries (and pauses) occur on the client side, then non transactional work 
loads wouldn't be impacted. 

Ideally, we'd have a test that reproduces this NoServerForRegionException and 
see if any changes are needed to handle this situation. You might be able to 
repro this by manually splitting the commit table and then performing a read 
against a transactional table. It also may just occur the very first time the 
commit table is attempted to be reached from a RS after the commit table is 
created.

 

> Ensure timeouts are configured low for RPCs to commit table
> -----------------------------------------------------------
>
>                 Key: OMID-117
>                 URL: https://issues.apache.org/jira/browse/OMID-117
>             Project: Apache Omid
>          Issue Type: Bug
>            Reporter: James Taylor
>            Priority: Major
>         Attachments: OMID-117.patch, OMID-117_hbase2.patch, 
> OMID-117_v2.patch, OMID-117_v3.patch, OMID-117_v4.patch, OMID-117_v5.patch, 
> OMID-117_v6.patch, OMID-117_v7.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to