richardstartin opened a new issue #8453: URL: https://github.com/apache/pinot/issues/8453
The plan for the query below is extremely inefficient: ```sql select * from githubEvents where repo_name like 'FeedScale%' and type = 'PullRequestEvent' limit 100 ``` | Operator | Operator_Id | Parent_Id | |----------|---------------|------------| | BROKER_REDUCE(limit:100) | 0 | -1 | | COMBINE_SELECT | 1 | 0 | | SELECT(selectList:actor_id, actor_login, actor_url, created_at, event_time, event_time_ts, id, public, pull_request_additions, pull_request_changed_files, pull_request_comments, pull_request_commits, pull_request_deletions, pull_request_id, pull_request_review_comments, pull_request_state, pull_request_title, pull_request_url, repo_id, repo_name, repo_url, type) | 2 | 1 | | TRANSFORM_PASSTHROUGH(actor_id, actor_login, actor_url, created_at, event_time, event_time_ts, id, public, pull_request_additions, pull_request_changed_files, pull_request_comments, pull_request_commits, pull_request_deletions, pull_request_id, pull_request_review_comments, pull_request_state, pull_request_title, pull_request_url, repo_id, repo_name, repo_url, type) | 3 | 2 | | PROJECT(repo_url, pull_request_comments, pull_request_changed_files, actor_login, pull_request_deletions, created_at, pull_request_url, type, pull_request_id, event_time_ts, public, pull_request_review_comments, pull_request_title, pull_request_state, repo_id, actor_url, repo_name, actor_id, id, pull_request_additions, pull_request_commits, event_time) | 4 | 3 | | FILTER_AND | 5 | 4 | | FILTER_INVERTED_INDEX(indexLookUp:inverted_index,operator:EQ,predicate:type = 'PullRequestEvent') | 6 | 5 | | FILTER_FULL_SCAN(operator:REGEXP_LIKE,predicate:regexp_like(repo_name,'^FeedScale.*$')) | 7 | 5 | Instead of iterating over the result of the `FILTER_INVERTED_INDEX` operator, the filter is done in parallel with a full scan and then is intersected with other filters. This as a few implications: 1. no amount of filtering will ever make this query produce a result without timing out unless the user has a text index 2. not even the limit can prevent the query from scanning the entire table because there may be fewer matches than the limit 3. Pinot users shouldn't put any expression in a where clause which isn't guaranteed to match an index Cheap filters should be pushed down before expensive filters. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
