gianm commented on issue #12022: URL: https://github.com/apache/druid/issues/12022#issuecomment-985604924
Other than the point in the comment above about the aggregator interface, this design looks good to me. There's one potential issue I wanted to bring up: right now, there's _other_ sources of overhead that the estimation doesn't capture. It's mostly fine because the current estimation is an overestimate, so it "covers" in a sense for other things that are not explicitly estimated. So when we make these estimates more accurate we may find we need to add estimation for more kinds of things. One example is the building of bitmaps during persist, which IIRC happens on heap. I've noticed in the past that this can have substantial footprint too. You should be able to repro a case where the footprint is big by having a multi value column that has 100+ values per row. I was working with a dataset like that when I first noticed this bitmap thing. For this particular one, maybe we can set a max dictionary size? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
