[jira] [Created] (HBASE-30234) Replication shippedBytes metric overflows to negative due to int truncation of batch size

terrytlu (Jira) Thu, 18 Jun 2026 03:03:09 -0700

terrytlu created HBASE-30234:
--------------------------------

             Summary: Replication shippedBytes metric overflows to negative due 
to int truncation of batch size
                 Key: HBASE-30234
                 URL: https://issues.apache.org/jira/browse/HBASE-30234
             Project: HBase
          Issue Type: Bug
          Components: metrics, Replication
    Affects Versions: 2.4.5
            Reporter: terrytlu



Summary ------- Several size-tracking variables in the replication source 
pipeline use `int` instead of `long`, causing integer overflow when the 
cumulative WAL entry batch size exceeds Integer.MAX_VALUE (~2GB). This results 
in negative values for JMX metrics (e.g., `shippedBytes`) and incorrect 
throttling behavior. Observed Symptoms ----------------- - The RegionServer JMX 
metric `shippedBytes` reports negative values. - Replication throttling may 
malfunction since the bandwidth calculation receives a negative batch size. 
Root Cause ---------- In `ReplicationSourceShipper.shipEdits()`, the heap size 
of a WAL entry batch is cast from `long` to `int`: int currentSize = (int) 
entryBatch.getHeapSize(); `WALEntryBatch.getHeapSize()` returns a `long`, but 
the downcast to `int` causes silent overflow when the value exceeds 
2,147,483,647 bytes (~2GB). This truncated value propagates through: 1. 
`ReplicationSource.tryThrottle(int batchSize)` — throttler receives negative 
size, producing incorrect sleep intervals. 2. `MetricsSource.shipBatch(long 
batchSize, int sizeInBytes)` — the `shippedBytes` metric is incremented by a 
negative value. 3. `ReplicationEndpoint.ReplicateContext.size` (int) — endpoint 
receives truncated size. 4. 
`ReplicationSourceWALReader.sizeOfStoreFilesIncludeBulkLoad()` — accumulates 
store file sizes into an `int`, which also overflows for bulk loads with large 
store files: totalStoreFilesSize = (int) (totalStoreFilesSize + 
stores.get(j).getStoreFileSizeBytes()); 5. 
`ReplicationThrottler.getNextSleepInterval(int size)` — accepts int, loses 
precision.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HBASE-30234) Replication shippedBytes metric overflows to negative due to int truncation of batch size

Reply via email to