wjhypo opened a new issue #11301:
URL: https://github.com/apache/druid/issues/11301


   ### Description
   Add an option to enable bitmap in IncrementalIndex during real time 
ingestion to improve the query efficiency by eliminating the wasteful CPU 
caused by full scan.
   
   ### Motivation
   When real time data from Kafka is ingested through peon tasks, they are 
first stored in an in memory data structure, a map, when query comes in, every 
single row in the map is iterated to check if it matches the filter which is 
naturally very inefficient and in a lot of cases only a small percentage of the 
rows end up qualifying for final aggregation which makes the full scan a waste 
of CPU. While for segments ingested through batch ingestion, there are bitmap 
index generated which can be used to pinpoint the candidate rows given the 
filter in the query which is much more efficient. Currently in order to meet 
the SLA (high QPS, low latency) of some real time use cases we have, we have to 
provision a lot of peon tasks and replicas to distribute load which is quite 
costly. If we can optionally enable bitmap when data is still in memory, it can 
improve query performance and lower the infra cost needed to support the same 
or better SLA. From our use case in production, in memory bitma
 p enables us to use 30% of previous capacity to support the same QPS and 
latency.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to