gianm commented on issue #12022:
URL: https://github.com/apache/druid/issues/12022#issuecomment-985604924


   Other than the point in the comment above about the aggregator interface, 
this design looks good to me.
   
   There's one potential issue I wanted to bring up: right now, there's _other_ 
sources of overhead that the estimation doesn't capture. It's mostly fine 
because the current estimation is an overestimate, so it "covers" in a sense 
for other things that are not explicitly estimated. So when we make these 
estimates more accurate we may find we need to add estimation for more kinds of 
things.
   
   One example is the building of bitmaps during persist, which IIRC happens on 
heap. I've noticed in the past that this can have substantial footprint too. 
You should be able to repro a case where the footprint is big by having a multi 
value column that has 100+ values per row. I was working with a dataset like 
that when I first noticed this bitmap thing. For this particular one, maybe we 
can set a max dictionary size?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to