somandal opened a new pull request, #8738:
URL: https://github.com/apache/pinot/pull/8738

   A vast majority of the code changes are in the tests.
   
   The existing behavior for EXPLAIN PLAN has the following limitations:
   - The plan query is sent to only one random server with a random segment. 
This can have the following issues:
       - The segment may get pruned on the server side
       - The segment may produce an Empty filter during the operator tree 
creation
       - The segment may produce a Match All filter during the operator tree 
creation
       - AND and OR operators may result in one or both predicate subtrees 
getting degenerated into an Empty or Match All filter. Due to this the AND or 
OR subtree may be converted into a leaf level predicate. This leads to 
confusion in the user's mind regarding the output of the explain plan.
   - The overall approach is not a good representation of the distributed query 
planning for each segment
   
   This PR address the above limitations by the following changes:
   - Send the plan query to all segments on all servers (the ones chosen by the 
Broker after broker side pruning) 
       - Each server returns a set of deduplicated plans along with the count 
of number of segments matching each plan
       - Each server returns the following data as part of the metadata:
           - Number of segments pruned by the server side
           - Number of segments with an empty filter tree
           - Number of segments with a match all filter tree
       - On the Broker side the set of plans returned by each server is again 
deduplicated and the number of segments matching each plan is updated for each 
unique plan
       - Each broker returns the number of segments pruned by the broker side 
as part of the BrokerResponse
       - Adds a verbose explain plan option which returns all of the 
deduplicated plans.
       - If verbose is disabled (default option) then a single explain plan out 
of the deduplicated plan is returned with the deepest plan tree. This is a 
better approximation than the current explain plan functionality due to 
deduplication across servers and segments.
   
   Example query: 
   ```
   EXPLAIN PLAN FOR SELECT invertedIndexCol1, noIndexCol1 FROM testTable WHERE 
startsWith (textIndexCol1, 'daff') AND noIndexCol4
   ```
   
   Here's an example of the Explain Plan output with verbose mode enabled:
   ```
   BROKER_REDUCE
       COMBINE_SELECT
           PLAN_START(numSegmentsForThisPlan:3)
               SELECT(selectList:invertedIndexCol1, noIndexCol1)
                   TRANSFORM_PASSTHROUGH(invertedIndexCol1, noIndexCol1)
                        PROJECT(invertedIndexCol1, noIndexCol1)
                            DOC_ID_SET
                                FILTER_AND
                                    
FILTER_FULL_SCAN(operator:EQ,predicate:noIndexCol4 = 'true')"
                                    
FILTER_EXPRESSION(operator:EQ,predicate:startswith(textIndexCol1,'daff') = 
'true')
           PLAN_START(numSegmentsForThisPlan:1)
               SELECT(selectList:invertedIndexCol1, noIndexCol1)
                   TRANSFORM_PASSTHROUGH(invertedIndexCol1, noIndexCol1)
                        PROJECT(invertedIndexCol1, noIndexCol1)
                            DOC_ID_SET
                                
FILTER_EXPRESSION(operator:EQ,predicate:startswith(textIndexCol1,'daff') = 
'true')
   ```
   
   With verbose mode disabled, only the first plan will be selected as it has 
the deepest tree:
   ```
   BROKER_REDUCE
       COMBINE_SELECT
           PLAN_START(numSegmentsForThisPlan:3)
               SELECT(selectList:invertedIndexCol1, noIndexCol1)
                   TRANSFORM_PASSTHROUGH(invertedIndexCol1, noIndexCol1)
                        PROJECT(invertedIndexCol1, noIndexCol1)
                            DOC_ID_SET
                                FILTER_AND
                                    
FILTER_FULL_SCAN(operator:EQ,predicate:noIndexCol4 = 'true')"
                                    
FILTER_EXPRESSION(operator:EQ,predicate:startswith(textIndexCol1,'daff') = 
'true')
   ```
   
   cc @siddharthteotia 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to