Andrew Purtell created HBASE-18027:
--------------------------------------

             Summary: HBaseInterClusterReplicationEndpoint should respect RPC 
size limits when batching edits
                 Key: HBASE-18027
                 URL: https://issues.apache.org/jira/browse/HBASE-18027
             Project: HBase
          Issue Type: Bug
          Components: Replication
    Affects Versions: 1.3.1, 2.0.0, 1.4.0
            Reporter: Andrew Purtell
            Assignee: Andrew Purtell
             Fix For: 2.0.0, 1.4.0, 1.3.2


In HBaseInterClusterReplicationEndpoint#replicate we try to replicate in 
batches. We create N lists. N is the minimum of configured replicator threads, 
number of 100-waledit batches, or number of current sinks. Every pending entry 
in the replication context is then placed in order by hash of encoded region 
name into one of these N lists. Each of the N lists is then sent all at once in 
one replication RPC. We do not test if the sum of data in each N list will 
exceed RPC size limits. This code presumes each individual edit is reasonably 
small. Not checking for aggregate size while assembling the lists into RPCs is 
an oversight and can lead to replication failure when that assumption is 
violated.

We can fix this by generating as many replication RPC calls as we need to drain 
a list, keeping each RPC under limit, instead of assuming the whole list will 
fit in one.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to