wjhypo opened a new issue #11301: URL: https://github.com/apache/druid/issues/11301
### Description Add an option to enable bitmap in IncrementalIndex during real time ingestion to improve the query efficiency by eliminating the wasteful CPU caused by full scan. ### Motivation When real time data from Kafka is ingested through peon tasks, they are first stored in an in memory data structure, a map, when query comes in, every single row in the map is iterated to check if it matches the filter which is naturally very inefficient and in a lot of cases only a small percentage of the rows end up qualifying for final aggregation which makes the full scan a waste of CPU. While for segments ingested through batch ingestion, there are bitmap index generated which can be used to pinpoint the candidate rows given the filter in the query which is much more efficient. Currently in order to meet the SLA (high QPS, low latency) of some real time use cases we have, we have to provision a lot of peon tasks and replicas to distribute load which is quite costly. If we can optionally enable bitmap when data is still in memory, it can improve query performance and lower the infra cost needed to support the same or better SLA. From our use case in production, in memory bitma p enables us to use 30% of previous capacity to support the same QPS and latency. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
