jasperjiaguo opened a new issue #4461: Proposal for Automated Inverted Index 
Recommendation 
URL: https://github.com/apache/incubator-pinot/issues/4461
 
 
   Adding inverted indices to frequently used dimensions is very helpful for 
cutting down query latency. Currently, whether or not applying an inverted 
index is purely the decision of system admin. A lot of manual work can go into 
that and there are several drawbacks:
     1. The problem of choosing a proper set of dimensions to apply inverted 
indices can be very time consuming. A system admin has to pull up the logs and 
scan through the queries for frequent used dimensions, which will be impossible 
at high QPS or with multiple use cases onboard. 
     2. Furthermore, this approach does guarantee the inverted indices used are 
optimal or targeting the right queries (say, one wants to decrease the latency 
slow running queries with >90th% latency)
    3. There is an on average 100% storage penalty for adding inverted index to 
a specific column. And there is a non-zero cost of applying inverted index in 
the filtering phase, meaning there is a "sweet spot" for the number of inverted 
index.
    4. For a lot of use cases with and-connected predicates, it is important to 
pick the one with best selectivity, which has the best effect in cutting down 
the time spent in lookup phase.
   
   Based on these points, we want to develop an automated process for inverted 
index recommendation.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to