[
https://issues.apache.org/jira/browse/HBASE-29665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jaehui Lee updated HBASE-29665:
-------------------------------
Affects Version/s: 2.5.12
2.6.3
> Bidirectional bulkload replication causes excessive network traffic
> -------------------------------------------------------------------
>
> Key: HBASE-29665
> URL: https://issues.apache.org/jira/browse/HBASE-29665
> Project: HBase
> Issue Type: Bug
> Components: Replication
> Affects Versions: 2.6.3, 2.5.12
> Reporter: Jaehui Lee
> Assignee: Jaehui Lee
> Priority: Major
> Labels: pull-request-available
> Attachments: image-2025-10-16-21-59-13-156.png
>
>
> h2. Problem
> When performing a bulkload on one of two clusters configured with
> bidirectional replication, the cluster executing the bulkload experiences
> unexpectedly high network usage.
> h2. Root Cause
> HBASE-22380 prevented circle bulkload replication by having
> {{SecureBulkloadManager}} check if the current clusterId already exists in
> {{{}clusterIds{}}}. If present, it assumes replication has already occurred
> and stops further processing.
> However, {{SecureBulkloadManager}} is invoked by the
> {{LoadIncrementalHFiles}} tool, which copies the target HFiles to a staging
> directory in the local HDFS _before_ checking whether replication should
> proceed. This premature copying causes unnecessary network and disk usage.
> h2. Solution
> Unlike {{clusterIds}} used in regular mutation replication (which are
> included in {{{}WALKey{}}}), the {{clusterIds}} for bulkload replication are
> managed in a separate class called {{{}BulkloadDescriptor{}}}. As a result,
> they are not filtered by {{{}ClusterMarkingEntryFilter{}}}, and filtering
> logic only runs after the bulkload request is received.
> The solution is to include {{clusterIds}} in the {{WALKey}} for bulkload
> operations, just like regular mutations. This allows filtering to occur
> before the bulkload request is processed, preventing unnecessary file copying.
> h2. Test
> Setup
> * Two clusters (Cluster A and Cluster B) running HBase 2.6.3
> * HBase and HDFS clusters are separated (compute-storage separation
> architecture)
> * Bulkload replication and bidirectional replication enabled
> * Bulkload executed on Cluster A only
> !image-2025-10-16-21-59-13-156.png|width=682,height=580!
> Since the bulkload is executed only on Cluster A, resource usage should be
> identical between scenarios 1 and 2. However, as shown in the metrics above,
> scenario 1 consumes significantly more resources. This is due to the
> unnecessary copying of HFiles to the staging directory, as explained in the
> root cause section.
> After applying the patch, scenario 3 shows resource usage identical to
> scenario 2, confirming that the unnecessary file copying has been eliminated.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)