yugan95 commented on PR #7621:
URL: https://github.com/apache/paimon/pull/7621#issuecomment-4533293854

   @JingsongLi Thanks for the detailed per-fix comments! Addressing each point:
   
     **1. RowHelper — `getSegments().size()` semantics**
   
     Good question. `AbstractBinaryWriter.getSegments()` returns a single 
`MemorySegment`
     (not a list/array — the method name is slightly misleading), and 
`MemorySegment.size()`
     returns the byte size of the underlying buffer. So the comparison
     `reuseWriter.getSegments().size() > 4MB` correctly checks byte count, and 
the threshold
     matches.
   
     I've also added a hysteresis guard per @LsomeYeah's feedback: 
`resetIfTooLarge()` now
     additionally checks `reuseRow.getSizeInBytes() < threshold`, so it only 
releases when
     the last written record was small. This avoids thrashing for sustained 
large-record
     workloads while still reclaiming memory when the workload transitions back.
   
     **Configurability:** I think a fixed 4MB is sufficient for the initial 
fix. If operators
      with unusual memory profiles need tuning, we can add a system property in 
a follow-up.
     Keeping the initial change minimal reduces merge risk.
   
     **2. BinaryRowSerializer shrink** — now improved with hysteresis. The 
`length <
     threshold` guard means consistent large-record workloads reuse the buffer 
without
     reallocation, addressing the "allocating every time" concern you noted.
   
     **3. HeapBytesVector** — addressed in commit f1954ffd.
   
     **4. Parquet config pass-through** — moved to #7956.
   
     **5. Performance overhead** — the check is 1 null check + 2 int 
comparisons, all on hot
     fields already in cache. For small-record workloads (buffer < 4MB), 
`resetIfTooLarge()`
     short-circuits at the first size check — no branch taken, no allocation. 
This is
     comparable to an array bounds check and will be inlined by the JIT. Happy 
to run a JMH
     benchmark if the team wants hard numbers, but I don't expect a measurable 
delta.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to