[
https://issues.apache.org/jira/browse/HBASE-16095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15347538#comment-15347538
]
Enis Soztutar commented on HBASE-16095:
---------------------------------------
bq. Do the data regions really need to block opening before index regions are
available?
Right now, the index consistency to make sure that all data is safely reflected
in indexes relies on the WAL replay which happens in the region opening code.
That is when the deadlock becomes a problem.
bq. This seems like a brittle way to approach this. In general, building
dependency ordering into distributed systems has a bit of a code smell to it.
Better to make each part of a distributed system resilient to failure. Is there
another way to approach this from the phoenix side?
Completely agree. I think the current mutable secondary index implementation
(without trx) is broken for Phoenix, not just because of deadlocks, but because
of the fact that the handlers are occupied for doing the RPCs to the index
regions. Due to the way HRegion MVCC works, no writes can complete until the
longest running RPC completes (MVCC writes commit in serial order). I have a
write up and an (upcoming) proposal to evaluate replication based secondary
indexing.
However, until we implement a better approach, we still have to fix the
deadlock issue with the current implementation. I feel that the priority region
opening might still be useful for other contexts as well (like opening
framework level tables sooner), hence we should still pursue this.
> Add priority to TableDescriptor and priority region open thread pool
> --------------------------------------------------------------------
>
> Key: HBASE-16095
> URL: https://issues.apache.org/jira/browse/HBASE-16095
> Project: HBase
> Issue Type: Bug
> Reporter: Enis Soztutar
> Assignee: Enis Soztutar
> Fix For: 2.0.0, 1.4.0
>
> Attachments: hbase-16095_v0.patch
>
>
> This is in the similar area with HBASE-15816, and also required with the
> current secondary indexing for Phoenix.
> The problem with P secondary indexes is that data table regions depend on
> index regions to be able to make progress. Possible distributed deadlocks can
> be prevented via custom RpcScheduler + RpcController configuration via
> HBASE-11048 and PHOENIX-938. However, region opening also has the same
> deadlock situation, because data region open has to replay the WAL edits to
> the index regions. There is only 1 thread pool to open regions with 3 workers
> by default. So if the cluster is recovering / restarting from scratch, the
> deadlock happens because some index regions cannot be opened due to them
> being in the same queue waiting for data regions to open (which waits for
> RPC'ing to index regions which is not open). This is reproduced in almost all
> Phoenix secondary index clusters (mutable table w/o transactions) that we
> see.
> The proposal is to have a "high priority" region opening thread pool, and
> have the HTD carry the relative priority of a table. This maybe useful for
> other "framework" level tables from Phoenix, Tephra, Trafodian, etc if they
> want some specific tables to become online faster.
> As a follow up patch, we can also take a look at how this priority
> information can be used by the rpc scheduler on the server side or rpc
> controller on the client side, so that we do not have to set priorities
> manually per-operation.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)