[
https://issues.apache.org/jira/browse/HBASE-14953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15050198#comment-15050198
]
Lars Hofhansl commented on HBASE-14953:
---------------------------------------
Interesting, didn't think of that case. Amazing how many problems a little
change like this can cause.
Why not add a real queue (i.e. not synchronous queue)? (In that case we need to
set coreThreads to maxThreads as well, and allow core threads to time out)
Since we're waiting on the futures to finish anyway, as they sit in the queue
we'd naturally wait exactly the right amount of time, so the queue can be
unbounded - eventually we'd have all workers waiting, which is what we want.
> HBaseInterClusterReplicationEndpoint: Do not retry the whole batch of edits
> in case of RejectedExecutionException
> -----------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-14953
> URL: https://issues.apache.org/jira/browse/HBASE-14953
> Project: HBase
> Issue Type: Bug
> Components: Replication
> Affects Versions: 2.0.0, 1.2.0, 1.3.0
> Reporter: Ashu Pachauri
> Assignee: Ashu Pachauri
> Priority: Critical
> Attachments: HBASE-14953-V1.patch
>
>
> When we have wal provider set to multiwal, the ReplicationSource has multiple
> worker threads submitting batches to HBaseInterClusterReplicationEndpoint. In
> such a scenario, it is quite common to encounter RejectedExecutionException
> because it takes quite long for shipping edits to peer cluster compared to
> reading edits from source and submitting more batches to the endpoint.
> The logs are just filled with warnings due to this very exception.
> Since we subdivide batches before actually shipping them, we don't need to
> fail and resend the whole batch if one of the sub-batches fails with
> RejectedExecutionException. Rather, we should just retry the failed
> sub-batches.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)