[ 
https://issues.apache.org/jira/browse/HBASE-16095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384755#comment-15384755
 ] 

Gary Helmling commented on HBASE-16095:
---------------------------------------

bq. The secondary index recovery mechanism depends on the index region(s) being 
online. The writes are happening in a blocking manner, so we block the actual 
region opener thread. Since the same region opener threads are used to open 
both data and index regions deadlock happens.

Then the "deadlock" here is entirely due to Phoenix's handling, and I don't 
think it's something we should be trying to address with HBase.  We've always 
said that doing blocking operations in coprocessor hooks is bad practice.  I 
don't think trying to paper over that for a specific use-case here really helps 
HBase.  Trying to impose ordering on operations in a distributed system just 
adds complexity and problems.

I think the better way for Phoenix to approach this is to fail the region open 
for the data table if the required index region is not online yet.  Yes, this 
is a problem with current HBase, where regions that go into FAILED_OPEN are 
never retried by assignment manager.  If we fix this so that FAILED_OPEN 
regions just get retried later, you wind up with a convergent system where each 
region can open naturally when it's dependencies are online.  This will be much 
more robust than trying to do ordering for all possible cases, and makes HBase 
better for everyone.  We're doing some work towards this in HBASE-16209, would 
appreciate any thoughts over there.

> Add priority to TableDescriptor and priority region open thread pool
> --------------------------------------------------------------------
>
>                 Key: HBASE-16095
>                 URL: https://issues.apache.org/jira/browse/HBASE-16095
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>             Fix For: 2.0.0, 1.3.0, 1.4.0, 0.98.21
>
>         Attachments: HBASE-16095-0.98.patch, HBASE-16095-0.98.patch, 
> hbase-16095_v0.patch, hbase-16095_v1.patch, hbase-16095_v2.patch, 
> hbase-16095_v3.patch
>
>
> This is in the similar area with HBASE-15816, and also required with the 
> current secondary indexing for Phoenix. 
> The problem with P secondary indexes is that data table regions depend on 
> index regions to be able to make progress. Possible distributed deadlocks can 
> be prevented via custom RpcScheduler + RpcController configuration via 
> HBASE-11048 and PHOENIX-938. However, region opening also has the same 
> deadlock situation, because data region open has to replay the WAL edits to 
> the index regions. There is only 1 thread pool to open regions with 3 workers 
> by default. So if the cluster is recovering / restarting from scratch, the 
> deadlock happens because some index regions cannot be opened due to them 
> being in the same queue waiting for data regions to open (which waits for  
> RPC'ing to index regions which is not open). This is reproduced in almost all 
> Phoenix secondary index clusters (mutable table w/o transactions) that we 
> see. 
> The proposal is to have a "high priority" region opening thread pool, and 
> have the HTD carry the relative priority of a table. This maybe useful for 
> other "framework" level tables from Phoenix, Tephra, Trafodian, etc if they 
> want some specific tables to become online faster. 
> As a follow up patch, we can also take a look at how this priority 
> information can be used by the rpc scheduler on the server side or rpc 
> controller on the client side, so that we do not have to set priorities 
> manually per-operation. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to