Re: [PR] [QDP] Integrate Apache Arrow for data processing [mahout]

via GitHub Thu, 04 Dec 2025 18:24:49 -0800


guan404ming commented on PR #680:
URL: https://github.com/apache/mahout/pull/680#issuecomment-3615052308


   > Looks good, but the current implementation forces a memory copy (Vec 
allocation) even when we want to use Arrow directly. We should refactor io.rs 
so that read_parquet_to_arrow is the base implementation, ensuring true 
zero-copy performance for the pipeline.
   origin: Disk -> Arrow -> Vec (copy) -> Arrow (copy) -> GPU
   we need: Disk -> Arrow -> Arrow (Zero-copy Reference) -> GPU (through 
Pointer)
   I think so, plz correct me if I'm wrong.
   
   
   I think you're right, thanks for pointing out. I've updated the 
implementation to only.
   
   Disk  --------->  Arrow Buffers (pointer only)  ----------> GPU
              Decode 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [QDP] Integrate Apache Arrow for data processing [mahout]

Reply via email to