[
https://issues.apache.org/jira/browse/HBASE-24757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17163782#comment-17163782
]
Viraj Jasani commented on HBASE-24757:
--------------------------------------
We already have warn based on row count (HBASE-18023) and we also have config
to reject the batch based on row count (HBASE-24024). The main purpose here is
to prevent unnecessary warn when row count sent by client is too high but
replication itself is quite a valid case (can accumulate maybe 10 times of row
count threshold) and we want to at least let replication follow row count limit
rules by performing appropriate batching and not producing warnings.
On the other hand, yes fat edits with huge size (and not row count) is
definitely more important case to deal with. However, this size limit is in
place HBASE-18027 where we calculate byte size and batch accordingly. It's
those redundant row count based warn that have been observed quite more than
often.
Thanks.
> ReplicationSink should limit the batch size for batch mutations based on
> hbase.rpc.rows.warning.threshold
> ---------------------------------------------------------------------------------------------------------
>
> Key: HBASE-24757
> URL: https://issues.apache.org/jira/browse/HBASE-24757
> Project: HBase
> Issue Type: Improvement
> Reporter: Viraj Jasani
> Assignee: Viraj Jasani
> Priority: Major
>
> At times there are quite a large no of WAL Edits to ship as part of
> Replication and sometimes replication queues accumulate huge list of Edits to
> process. ReplicationSink at the sink server usually goes through all Edits
> and creates map of table -> list of rows grouped by clusterIds, and performs
> batch mutation of all rows per table level. However, there is no limit to no
> of Rows that are sent as part of batch mutate call. If no of rows > limit
> threshold defined by hbase.rpc.rows.warning.threshold, we usually get warn
> "Large batch operation detected". If hbase.rpc.rows.size.threshold.reject is
> turned on, RS will reject the whole batch without processing.
> We should let Replication Sink honour this threshold value and accordingly
> keep the size lower per batch mutation call.
> Replication triggered batch mutations should always be consumed but keeping
> limit of mutation low enough will let the system function at the same pace
> and without triggering warnings. This will also restrict exploitation of heap
> and cpu cycles at the destination.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)