Re: [PR] [QDP] Integrate Apache Arrow for data processing [mahout]

via GitHub Thu, 04 Dec 2025 17:30:37 -0800


rich7420 commented on PR #680:
URL: https://github.com/apache/mahout/pull/680#issuecomment-3614951502


   Looks good, but the current implementation forces a memory copy (Vec 
allocation) even when we want to use Arrow directly. We should refactor io.rs 
so that read_parquet_to_arrow is the base implementation, ensuring true 
zero-copy performance for the pipeline.
   origin: Disk -> Arrow -> Vec (copy) -> Arrow (copy) -> GPU
   we need: Disk -> Arrow -> Arrow (Zero-copy Reference) -> GPU (through 
Pointer)
   I think so, plz correct me if I'm wrong.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [QDP] Integrate Apache Arrow for data processing [mahout]

Reply via email to