Does Pinot do early stop when enough results have been already collected? We have queries of form "SELECT * FROM table WHERE userID='H' AND sourceEventTimestamp>=t1 AND sourceEventTimestamp<=t2 ORDER BY sourceEventTimestamp DESC LIMIT 500".
The table has been sorted by sourceEventTimestamp and userID has inverted index. I notice that the selectivity of the query is low (meaning many rows passing the condition). So the first 500 results should be collected relatively quick. But the exec times are too long i.e., > 10s. =============== Slack conversation with Kishore and Xiang attached below=== Kishore G <https://app.slack.com/team/UDRJ7G85T> 12:32 PM <https://apache-pinot.slack.com/archives/CDRCA57FC/p1582057958151100> We can do early termination if there is no order by 12:33 <https://apache-pinot.slack.com/archives/CDRCA57FC/p1582058018152300> But with order by, there is nothing much we can do to terminate early... 12:34 <https://apache-pinot.slack.com/archives/CDRCA57FC/p1582058040153100> What is the problem you are trying to solve? Ting Chen <https://app.slack.com/team/UG3BZ4ALQ> 12:59 PM <https://apache-pinot.slack.com/archives/CDRCA57FC/p1582059583158200> the main issue we have is query latency is too long (~15 s). For early termination, since the table is physically sorted by the ORDER_BY column, I suppose an ideal plan is to check the relevant segments (starting with the segments with the largest value in the filtering range) and stop when enough results have been collected? Kishore G <https://app.slack.com/team/UDRJ7G85T> 1:02 PM <https://apache-pinot.slack.com/archives/CDRCA57FC/p1582059747160100> That’s possible, what is the time range in the query Ting Chen <https://app.slack.com/team/UG3BZ4ALQ> 1:04 PM <https://apache-pinot.slack.com/archives/CDRCA57FC/p1582059864161200> from 7 days ago to a few second ago. Basically the past 7 days' data. Kishore G <https://app.slack.com/team/UDRJ7G85T> 1:10 PM <https://apache-pinot.slack.com/archives/CDRCA57FC/p1582060200165000> It’s a good optimization to have. Worth starting a thread and discussing further. For now, is it possible for the client to break it up into multiple queries- one for each day? Ting Chen <https://app.slack.com/team/UG3BZ4ALQ> 1:12 PM <https://apache-pinot.slack.com/archives/CDRCA57FC/p1582060338167500> I will file an issue for this and do some investigation on codes. Yes, you idea is basically the walk-around for now. I ask the customers to look for the past 1 day's data instead: they still got their results needed while the latency is halved. Kishore G <https://app.slack.com/team/UDRJ7G85T> 1:14 PM <https://apache-pinot.slack.com/archives/CDRCA57FC/p1582060485168700> Cool. What you want is doable with some optimization in the planning phase.. Xiang Fu <https://app.slack.com/team/UGRJA9TEH> 1:46 PM <https://apache-pinot.slack.com/archives/CDRCA57FC/p1582062381169400> @Ting Chen <https://apache-pinot.slack.com/team/UG3BZ4ALQ> 1:46 <https://apache-pinot.slack.com/archives/CDRCA57FC/p1582062405170000> one thing about this is that the query will hit many segments and merge the results 1:47 <https://apache-pinot.slack.com/archives/CDRCA57FC/p1582062448170800> so it’s hard to tell the global ordering to do early termination
