Andrew Purtell created HBASE-18027:
--------------------------------------
Summary: HBaseInterClusterReplicationEndpoint should respect RPC
size limits when batching edits
Key: HBASE-18027
URL: https://issues.apache.org/jira/browse/HBASE-18027
Project: HBase
Issue Type: Bug
Components: Replication
Affects Versions: 1.3.1, 2.0.0, 1.4.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
Fix For: 2.0.0, 1.4.0, 1.3.2
In HBaseInterClusterReplicationEndpoint#replicate we try to replicate in
batches. We create N lists. N is the minimum of configured replicator threads,
number of 100-waledit batches, or number of current sinks. Every pending entry
in the replication context is then placed in order by hash of encoded region
name into one of these N lists. Each of the N lists is then sent all at once in
one replication RPC. We do not test if the sum of data in each N list will
exceed RPC size limits. This code presumes each individual edit is reasonably
small. Not checking for aggregate size while assembling the lists into RPCs is
an oversight and can lead to replication failure when that assumption is
violated.
We can fix this by generating as many replication RPC calls as we need to drain
a list, keeping each RPC under limit, instead of assuming the whole list will
fit in one.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)