[ 
https://issues.apache.org/jira/browse/HBASE-29665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jaehui Lee updated HBASE-29665:
-------------------------------
    Affects Version/s: 2.5.12
                       2.6.3

> Bidirectional bulkload replication causes excessive network traffic
> -------------------------------------------------------------------
>
>                 Key: HBASE-29665
>                 URL: https://issues.apache.org/jira/browse/HBASE-29665
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 2.6.3, 2.5.12
>            Reporter: Jaehui Lee
>            Assignee: Jaehui Lee
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: image-2025-10-16-21-59-13-156.png
>
>
> h2. Problem
> When performing a bulkload on one of two clusters configured with 
> bidirectional replication, the cluster executing the bulkload experiences 
> unexpectedly high network usage.
> h2. Root Cause
> HBASE-22380 prevented circle bulkload replication by having 
> {{SecureBulkloadManager}} check if the current clusterId already exists in 
> {{{}clusterIds{}}}. If present, it assumes replication has already occurred 
> and stops further processing.
> However, {{SecureBulkloadManager}} is invoked by the 
> {{LoadIncrementalHFiles}} tool, which copies the target HFiles to a staging 
> directory in the local HDFS _before_ checking whether replication should 
> proceed. This premature copying causes unnecessary network and disk usage.
> h2. Solution
> Unlike {{clusterIds}} used in regular mutation replication (which are 
> included in {{{}WALKey{}}}), the {{clusterIds}} for bulkload replication are 
> managed in a separate class called {{{}BulkloadDescriptor{}}}. As a result, 
> they are not filtered by {{{}ClusterMarkingEntryFilter{}}}, and filtering 
> logic only runs after the bulkload request is received.
> The solution is to include {{clusterIds}} in the {{WALKey}} for bulkload 
> operations, just like regular mutations. This allows filtering to occur 
> before the bulkload request is processed, preventing unnecessary file copying.
> h2. Test
> Setup
>  * Two clusters (Cluster A and Cluster B) running HBase 2.6.3
>  * HBase and HDFS clusters are separated (compute-storage separation 
> architecture)
>  * Bulkload replication and bidirectional replication enabled
>  * Bulkload executed on Cluster A only
> !image-2025-10-16-21-59-13-156.png|width=682,height=580!
> Since the bulkload is executed only on Cluster A, resource usage should be 
> identical between scenarios 1 and 2. However, as shown in the metrics above, 
> scenario 1 consumes significantly more resources. This is due to the 
> unnecessary copying of HFiles to the staging directory, as explained in the 
> root cause section.
> After applying the patch, scenario 3 shows resource usage identical to 
> scenario 2, confirming that the unnecessary file copying has been eliminated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to