mcvsubbu commented on pull request #6479:
URL: https://github.com/apache/incubator-pinot/pull/6479#issuecomment-771020682


   > Thanks for the details @jackjlli, could you also is IntermediateSegment 
better than existing MutableSegment? For example, you could stream input data 
to MutableSegment and flush it as needed. This also solves multiple problems:
   > 
   > * Common code base for offline and RT segment generation (at least for the 
streaming part).
   > * Sorting can now be done for offline within SegmentGeneration, instead of 
having users to explicitly do so.
   > * Auto segment sizing that happens in RT will can also be done with 
offline now.
   > 
   > Thoughts @jackjlli @Jackie-Jiang?
   
   I think this is a good idea to explore, but I suspect memory utilization on 
the offline side may go up significantly.
   
   Also, the auto-segment sizing in realtime is implemented (in the controller) 
by learning the history of segments already completed. For offline generation, 
if we can keep a history or some learning mechanism, then it may be possible to 
implement approximate segment sizing algorithms -- whether we use 
MutableSegment to build segments or not.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to