mcvsubbu opened a new issue #4036: Reduce heap usage when building realtime 
segments
URL: https://github.com/apache/incubator-pinot/issues/4036
 
 
   Reducing heap usage while building completed segments. Currently, the 
segment builder is designed to read incoming data row by row, and build 
dictionaries in a hash table before translating them to the on-disk format of a 
dictionary. We can by-pass these steps since we already have the segment in 
columnar format (realtime consumers ingest rows but store in a columnar format 
for serving queries). Initial prototype has shown significant reduction in heap 
usage during segment builds. If we reduce heap usage (better yet, move 
completely to off-heap based segment completion) more segments can be packed 
into a single host, saving hardware cost. If a higher latency can be tolerated, 
these hosts could use SSDs and map off-heap memory from files (Pinot already 
provides primitives for doing these)
   
   Prototype: 
https://github.com/apache/incubator-pinot/tree/columnar-segment-builder

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to