Re: [PR] [DRAFT] [FEATURE] Inverted index based reduce partition data writer / reader [celeborn]

via GitHub Sun, 25 May 2025 15:09:49 -0700


mridulm commented on PR #3279:
URL: https://github.com/apache/celeborn/pull/3279#issuecomment-2908117292


   Couple of things to watch out for based on a quick read of the doc:
   
   a) A large number of random/small reads has very bad performance 
characteristics at scale (this is why data is kept reduce oriented, or sorted).
   
   b) As number of mappers and reducers increase (and so data per reducer in 
mapper's output and vice versa), we have to ensure the overhead of maintaining 
the mapping is kept reasonable.
   
   
   I would suggest submitting this via a CIP, would be good to get broader 
community feedback.
   
   Thanks for working on this !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [DRAFT] [FEATURE] Inverted index based reduce partition data writer / reader [celeborn]

Reply via email to