[ 
https://issues.apache.org/jira/browse/HBASE-16095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384679#comment-15384679
 ] 

Enis Soztutar commented on HBASE-16095:
---------------------------------------

Thanks Stack for taking a look. 
bq. can't we keep phoenix stuff up in phoenix? Secondary indices via 
transaction are almost here. Isn't that the proper fix rather than adding new 
pools to hbase (we don't need more pools), etc.
Unfortunately no. This happens in region open, so we need a mechanism to inject 
/ configure region opening, nothing related to RPC scheduling. 
bq. Why we need this change if configuring below could address deadlock?
That is deadlock on RPC's and regular index writes. This particular issue is 
about the writes happening to the index region when we are opening the data 
region. The secondary index recovery mechanism depends on the index region(s) 
being online. The writes are happening in a blocking manner, so we block the 
actual region opener thread. Since the same region opener threads are used to 
open both data and index regions deadlock happens. 
bq. This sort of dependence amongst regions – i.e. the index has to be online 
before data region can come on line – is not supported in hbase; what happens 
if server carrying index region crashes... and other scenarios, etc. Has it 
been worked through? If so, where can I read about it?
I am not sure where you can read more. There were presentations online, but the 
implementation in P is some years old with some changes.
bq. We have a mechanism for onlining important regions already that has loads 
of holes in it (meta, namespace, etc.). The new AMv2 will go a long ways toward 
plugging a bunch of them. In this issue we are proposing a new means of doing a 
similar thing but on an even shakier foundation.
Not quite the same thing. AM / Master can prioritize the opening of regions, 
but we cannot control all the timing from a master perspective. We cannot time 
new tables being created while servers going down and WAL recovery happening, 
etc. So there will never be perfect-and-strict ordering that can be done from a 
master perspective if for example we want to ensure index table regions are 
assigned first before the data table regions from AM. AM can do a best effort 
job. On the other hand though, region servers do not need to order the incoming 
region open requests. If there is no dependency then, having a fixed thread 
pool to open regions works. If there is dependency, then it does not. 

bq. Seems dodgy Enis Soztutar, brittle as Gary Helmling says.
See my comment at 
https://issues.apache.org/jira/browse/HBASE-16095?focusedCommentId=15347538&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15347538.
 Transactions is an optional concept in Phoenix, and it is still not GA. Even 
if it was, not all use cases need transactions. We should still support 
secondary indexes without transactions in Phoenix for some time. I agree that 
the mutable index architecture as is today should be redesigned to remove the 
inter-region dependency and blocking the handlers. Working on a proposal to do 
this using replication, but getting that fully working will take some time. 
Until then, we have real users and customers running with the current stuff 
that needs the fix. 

bq. Phoenix users will have to ensure they configure all index tables as 
PRIORITY (making index tables 'high priority' is a little unexpected)? For 
preexisting tables they'll have to go through and enable this everywhere? 
I should have linked the Phoenix issue. My b. PHOENIX-3072 is the fix in 
Phoenix that would automatically configure the priorities in Phoenix.

BTW, I think that the priority definition in the table descriptor also serves 
another purpose. We can use that in RPC scheduling itself, so that should be 
useful in itself regardless of P. Moreover, I was thinking that although HBase 
"does not support" region interdependencies, we still have important tables 
with dependencies for most of the frameworks, like commit table in omid, 
catalog/stats table in Phoenix as well as hbase-level system tables that uses 
this.    

> Add priority to TableDescriptor and priority region open thread pool
> --------------------------------------------------------------------
>
>                 Key: HBASE-16095
>                 URL: https://issues.apache.org/jira/browse/HBASE-16095
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>             Fix For: 2.0.0, 1.3.0, 1.4.0, 0.98.21
>
>         Attachments: HBASE-16095-0.98.patch, HBASE-16095-0.98.patch, 
> hbase-16095_v0.patch, hbase-16095_v1.patch, hbase-16095_v2.patch, 
> hbase-16095_v3.patch
>
>
> This is in the similar area with HBASE-15816, and also required with the 
> current secondary indexing for Phoenix. 
> The problem with P secondary indexes is that data table regions depend on 
> index regions to be able to make progress. Possible distributed deadlocks can 
> be prevented via custom RpcScheduler + RpcController configuration via 
> HBASE-11048 and PHOENIX-938. However, region opening also has the same 
> deadlock situation, because data region open has to replay the WAL edits to 
> the index regions. There is only 1 thread pool to open regions with 3 workers 
> by default. So if the cluster is recovering / restarting from scratch, the 
> deadlock happens because some index regions cannot be opened due to them 
> being in the same queue waiting for data regions to open (which waits for  
> RPC'ing to index regions which is not open). This is reproduced in almost all 
> Phoenix secondary index clusters (mutable table w/o transactions) that we 
> see. 
> The proposal is to have a "high priority" region opening thread pool, and 
> have the HTD carry the relative priority of a table. This maybe useful for 
> other "framework" level tables from Phoenix, Tephra, Trafodian, etc if they 
> want some specific tables to become online faster. 
> As a follow up patch, we can also take a look at how this priority 
> information can be used by the rpc scheduler on the server side or rpc 
> controller on the client side, so that we do not have to set priorities 
> manually per-operation. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to