saurabhd336 commented on PR #3279:
URL: https://github.com/apache/celeborn/pull/3279#issuecomment-2908151789
Hi @mridulm
Sure let me pursue this via CIP!
a) Agreed, random seeks could be slower than sequential, I wanted to propose
this as an alternative for cases where the sorting overhead may be greater. I
think a way we can tackle this though is through prefetching i.e. since the
chunks ids to be fetched are known beforehand, a few chunks can be fetched in
advance and cached in memory in anticipation of next read request(s). Ofcourse
this is equally applicable to all partition writer types (sorted / non-sorted /
inverted index based)
b) Again agreed, i had added some rough estimates in the doc, would work on
adding more concrete numbers. The overheads are
1) Header data buffer (8 + 8 + 4 bytes)
2) Chunk offsets buffer (8 * number of chunks) - To keep this one in check,
I had thought about aggregating data for a given mapper id in memory chunks of
say 2mb each before flushing to disk. The "in-memory" part is not that
important and the buffer may be backed by mmapped temporary files (to keep
memory usage low). If we follow this model, 10g (default split partition size)
data file -> 5000 chunks -> 5000 * 8 = 40kb overhead for offsets.
3) Inverted index buffer has 2 parts
i) Bitmap offset buffer -> This is deterministic (8 * number of
mappers). For 10k mappers -> 80kb
ii) Actual serialized bitmaps -> Proportional to number of mappers. I
don't have a good way of estimating it yet, but I've seen the serialized size
of even 10000 bitmaps is pretty low (< 2mb). Let me add some more benchmarks
and numbers.
Another aspect I'm trying to improve is the need to syncronize the actual
[writeBufferToFile](https://github.com/apache/celeborn/pull/3279/files#diff-48761c959e94314b53865866b33156b04a97461d2860da333711c6fbef865fbdR108)
method. Since the exact offsets + chunk id are needed during buffer flush.
While the fact that we buffer into in-memory chunk buffer before flushing helps
a little bit, I'm working on trying to improve this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]