Re: [PR] [QDP] Integrate Apache Arrow for data processing [mahout]

via GitHub Fri, 05 Dec 2025 07:09:26 -0800


guan404ming commented on PR #680:
URL: https://github.com/apache/mahout/pull/680#issuecomment-3617300782


   After our offline discussion and investigation, we confirmed that Parquet 
cannot achieve true zero-copy from disk to memory because its data is stored in 
compressed and encoded form and must be decoded before use. We’ll continue 
using Parquet for now given its convenience and practicality, and I’ve updated 
the TODO to migrate the encoder to a chunk-based API. 
   
   I would follow up on Arrow IPC as a potential path toward a real zero-copy 
data pipeline.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [QDP] Integrate Apache Arrow for data processing [mahout]

Reply via email to