jja725 commented on issue #18308:
URL: https://github.com/apache/hudi/issues/18308#issuecomment-4664593343

   Hi @xushiyan , nice to see you again in the community. 
   Memory accounting across the FFI boundary. Even if RecordBatch exchange is 
zero-copy, the buffer allocations happen outside Velox's MemoryPool. Velox's 
memory arbitrator can't see them, so it won't throttle, spill, or reclaim 
against them. From Velox's accounting perspective the operator looks empty 
while real RSS keeps climbing — and the first thing to notice is the OS via OOM 
kill, not Velox.
   
   We hit exactly this with the early arrow-parquet reader integration in Velox 
a few years back. The fix there was ultimately to bring the reader inside 
Velox's allocation umbrella (which is how the native Parquet reader exists 
today). Anything routed through arrow-rs / object_store / Rust's global 
allocator will reproduce the same shape unless the design accounts for it up 
front.
   
   A few directions worth considering before locking in the design:
   
   1. Custom allocator on the Rust side. arrow-rs Buffer supports custom 
allocations — hudi-rs could optionally allocate through a Velox-supplied 
allocator (Velox's MemoryPool* exposed as a C ABI: alloc(size) -> void*, 
free(ptr, size)). Adds complexity but makes accounting exact.
   2. Memory ticket / bounded reservation. Velox reserves N bytes in its pool 
before each next() call; the FFI returns batches sized to fit; the reservation 
is released when the RowVector is destroyed. Approximate but enforceable, and 
survives backpressure scenarios.
   3. C++ native hudi client so the memory allocation is controlled by Velox 
natively
   
   Whichever direction the community goes, getting this into the design doc 
before code lands will save a lot of pain — retrofitting allocator hooks into a 
working Rust pipeline is much harder than designing for it. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to