gnuhpc opened a new issue, #2346:
URL: https://github.com/apache/fluss/issues/2346

   ## Bug Description
   
   `IndexOutOfBoundsException` occurs in `ArrowVarCharWriter.doWrite()` when 
writing VARCHAR fields to Arrow WAL under moderate concurrency (8 threads). 
This bug causes **silent write failures** on the tablet server side, leading to 
potential data loss in production environments.
   
   ### Stack Trace
   
   ```
   ERROR org.apache.fluss.server.replica.ReplicaManager - Error put records to 
local kv on replica TableBucket{tableId=18, bucket=0}
   java.lang.IndexOutOfBoundsException: null
        at java.nio.ByteBuffer.wrap(Unknown Source)
        at 
org.apache.fluss.memory.MemorySegment.wrapInternal(MemorySegment.java:258)
        at org.apache.fluss.memory.MemorySegment.wrap(MemorySegment.java:252)
        at 
org.apache.fluss.row.BinarySection.wrapByteBuffer(BinarySection.java:112)
        at 
org.apache.fluss.row.arrow.writers.ArrowVarCharWriter.doWrite(ArrowVarCharWriter.java:37)
        at 
org.apache.fluss.row.arrow.writers.ArrowFieldWriter.write(ArrowFieldWriter.java:59)
        at org.apache.fluss.row.arrow.ArrowWriter.writeRow(ArrowWriter.java:206)
        at 
org.apache.fluss.record.MemoryLogRecordsArrowBuilder.append(MemoryLogRecordsArrowBuilder.java:183)
        at 
org.apache.fluss.server.kv.wal.ArrowWalBuilder.append(ArrowWalBuilder.java:48)
        at org.apache.fluss.server.kv.KvTablet.applyUpdate(KvTablet.java:519)
   ```
   
   ---
   
   ## Reproduction
   
   ### Environment
   - **Fluss Version**: 0.9-SNAPSHOT (latest main branch as of 2026-01-11)
   - **Cluster**: 1 coordinator + 1 tablet server (Docker)
   - **JDK**: OpenJDK 11+
   
   ### Reproduction Program
   
   I created a standalone Java program that reliably reproduces this bug. See 
the attached `ArrowWalRepro.java` file.
   
   **Key characteristics:**
   - Creates a table with `INT` primary key + `VARCHAR` payload column
   - Spawns multiple concurrent writer threads
   - Each thread performs UPSERT operations with VARCHAR data
   - Triggers buffer reuse/memory segment pooling scenarios
   
   ### Reproduction Steps
   
   1. **Compile the reproduction program:**
   ```bash
   # Assuming Fluss source is built at /path/to/fluss
   javac -cp "$(find /path/to/fluss -name '*.jar' | tr '\n' ':')" 
ArrowWalRepro.java
   ```
   
   2. **Run the reproduction:**
   ```bash
   java -cp "$(find /path/to/fluss -name '*.jar' | tr '\n' ':'):." 
ArrowWalRepro \
     --bootstrap localhost:9123 \
     --threads 8 \
     --records 20000 \
     --payload-size 256 \
     --buffer-memory 8mb \
     --buffer-page 64kb
   ```
   
   3. **Check tablet server logs:**
   ```bash
   # You should see IndexOutOfBoundsException errors
   grep -A 20 "IndexOutOfBoundsException" <tablet-server-log-file>
   ```
   
   ### Reproduction Results
   
   With the configuration above:
   - **Bug triggered**: 2 times during ~1 minute run
   - **Total writes**: 160,000 records (8 threads × 20,000 records)
   - **Failure rate**: ~0.00125% (2 failures / 160,000 writes)
   - **Error timestamps**: 
     - 2026-01-11 07:07:10,270
     - 2026-01-11 07:07:53,803 (43 seconds apart)
   
   ---
   
   ## Root Cause Analysis
   
   ### The Bug Mechanism
   
   The issue lies in an **offset semantic mismatch** between `BinarySection` 
and `ByteBuffer.wrap()`:
   
   1. **`BinarySection`** stores:
      - `heapMemory`: byte array reference (may be a subsection of a larger 
buffer)
      - `offset`: **logical offset relative to the original large buffer**
   
   2. **`ArrowVarCharWriter.doWrite():37`** calls:
      ```java
      ByteBuffer buffer = binarySection.wrapByteBuffer();
      ```
   
   3. **`BinarySection.wrapByteBuffer():112`** does:
      ```java
      return MemorySegment.wrap(heapMemory, offset, length);
      ```
   
   4. **`MemorySegment.wrap():252`** eventually calls:
      ```java
      ByteBuffer.wrap(heapMemory, offset, length)
      ```
   
   5. **Bug trigger**: When `heapMemory` is already a subsection:
      - `heapMemory.length` = small (e.g., 1024 bytes)
      - `offset` = large (e.g., 5000 - logical offset from original buffer)
      - **Condition**: `offset > heapMemory.length`
      - **Result**: `IndexOutOfBoundsException` in `ByteBuffer.wrap()`
   
   ### Why It's Intermittent
   
   The bug depends on memory segment reuse patterns:
   - Under high concurrency, the memory pool allocates/reuses segments 
frequently
   - When a `BinarySection` is created from a reused/subsectioned memory 
segment, the offset may exceed the subsection's bounds
   - This is a **timing-dependent** bug that manifests under buffer pressure
   
   ---
   
   ## Impact Assessment
   
   | Impact Factor | Severity | Details |
   |---------------|----------|---------|
   | **Data Loss** | 🔴 **HIGH** | Writes fail silently on tablet server; 
clients may receive false success |
   | **Consistency** | 🔴 **HIGH** | Failed writes not visible to client; no 
retry mechanism |
   | **Frequency** | 🟡 **MEDIUM** | Reproducible with moderate concurrency (8 
threads) |
   | **Detectability** | 🔴 **LOW** | Only visible in tablet server ERROR logs; 
no client-side indication |
   
   ### Production Risk
   
   - ❌ **Blocks high-concurrency workloads** (e.g., YCSB benchmarks, Redis-like 
use cases)
   - ❌ **Silent data loss**: Clients believe writes succeeded, but data not 
persisted
   - ❌ **No automatic recovery**: Failed writes require application-level retry
   - ⚠️ **Hard to diagnose**: Appears as random data loss without obvious 
pattern
   
   ---
   
   ## Proposed Solution
   
   ### Short-Term Fix (Immediate)
   
   Change `ArrowVarCharWriter.doWrite():37` from:
   ```java
   ByteBuffer buffer = binarySection.wrapByteBuffer(); // BROKEN: offset 
mismatch
   ```
   
   To:
   ```java
   byte[] bytes = binarySection.toBytes(); // SAFE: creates new copy with 
correct bounds
   ByteBuffer buffer = ByteBuffer.wrap(bytes);
   ```
   
   **Trade-off**: Introduces memory copy overhead, but guarantees correctness.
   
   ### Long-Term Fix (Architectural)
   
   Redesign offset semantics in `BinarySection` / `MemorySegment`:
   1. Ensure `heapMemory` always represents the **exact byte range** [offset, 
offset+length)
   2. Or: Store both "absolute offset" and "relative offset" explicitly
   3. Or: Change `wrapByteBuffer()` to adjust offset to be relative to 
`heapMemory` base
   
   This requires careful analysis of memory management architecture.
   
   ---
   
   ## Verification Checklist
   
   After applying a fix, verify with:
   
   1. ✅ Run reproduction program with same parameters → **0 exceptions**
   2. ✅ Increase concurrency (16-32 threads) → **0 exceptions**
   3. ✅ Extended test (1M+ records) → **0 exceptions**
   4. ✅ YCSB benchmark (real-world workload) → **0 exceptions**
   5. ✅ Check performance impact (memory copies if short-term fix used)
   
   ---
   
   ## Attached Files
   
   1. **`ArrowWalRepro.java`**: Complete standalone reproduction program
   2. **`tablet-server-error-logs.txt`**: Full stack traces from tablet server
   
   ---
   
   ## Additional Context
   
   - Discovered during Fluss 0.9 compatibility testing for CAPE 
(Cache-Augmented Persistent Engine) integration
   - Bug was first observed in YCSB Redis benchmark workloads with high write 
concurrency
   - This standalone reproduction was created to isolate the issue from the 
larger testing framework
   
   ---
   
   ## Request for Maintainers
   
   This is a **critical data integrity bug** that should be addressed before 
0.9 release. We are happy to:
   - Provide additional debugging information if needed
   - Test proposed fixes with our reproduction program
   - Contribute a PR if guidance on preferred approach is provided
   
   Thank you for your attention to this issue!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to