lfernandez93 edited a comment on issue #7642:
URL: https://github.com/apache/pinot/issues/7642#issuecomment-953099072


   thanks @kishoreg this explains a lot then. basically if 
`numEntriesScannedInFilter` is high means that selectivity is high and maybe we 
need some more pruning that number definitely wouldn't correlate to 
`numDocsScanned` which happens after we have done postFilter which will be the 
final result(?).
   
   I just got weird out cause as soon as I remove the `time` then the query is 
much faster and I was wondering why if I'm trying to select even less data with 
`time` I do see `numEntriesScannedInFilter` go to 0 after i remove the `time` 
clause.
   
   @richardstartin trace info
   
   `
       "traceInfo": {
           "pinot-server-1.pinot-server-headless.pinot.svc.cluster.local": 
"[{\"0\":[{\"GroupByOrderByCombineOperator 
Time\":53},{\"InstanceResponseOperator 
Time\":53}]},{\"0_0\":[{\"SortedIndexBasedFilterOperator 
Time\":0},{\"DocIdSetOperator Time\":0},{\"ProjectionOperator 
Time\":0},{\"PassThroughTransformOperator Time\":0},{\"DocIdSetOperator 
Time\":0},{\"ProjectionOperator Time\":0},{\"PassThroughTransformOperator 
Time\":0},{\"AggregationGroupByOrderByOperator 
Time\":0}]},{\"0_1\":[{\"SortedIndexBasedFilterOperator 
Time\":0},{\"ScanBasedFilterOperator Time\":0},{\"AndFilterOperator 
Time\":0},{\"DocIdSetOperator Time\":0},{\"ProjectionOperator 
Time\":0},{\"PassThroughTransformOperator Time\":0},{\"DocIdSetOperator 
Time\":0},{\"ProjectionOperator Time\":0},{\"PassThroughTransformOperator 
Time\":0},{\"AggregationGroupByOrderByOperator 
Time\":0}]},{\"0_2\":[{\"SortedIndexBasedFilterOperator 
Time\":0},{\"RangeFilterOperator Time\":53},{\"AndFilterOperator 
Time\":53},{\"DocIdSetOperato
 r Time\":53},{\"ProjectionOperator Time\":53},{\"PassThroughTransformOperator 
Time\":53},{\"AggregationGroupByOrderByOperator Time\":53}]}]"
       },
   `
   
   debug info
   
   ```json
       "numServersQueried": 1,
       "numServersResponded": 1,
       "numSegmentsQueried": 29,
       "numSegmentsProcessed": 3,
       "numSegmentsMatched": 2,
       "numConsumingSegmentsQueried": 1,
       "numDocsScanned": 84,
       "numEntriesScannedInFilter": 1169794,
       "numEntriesScannedPostFilter": 84,
       "numGroupsLimitReached": false,
       "totalDocs": 150832761,
       "timeUsedMs": 57,
       "offlineThreadCpuTimeNs": 0,
       "realtimeThreadCpuTimeNs": 53524933,
       "numRowsResultSet": 10,
       "minConsumingFreshnessTimeMs": 1635351812064
   ```
   
   table configs (`time` is `serve_time` `id` is `user_id`):
   
   ```json
   "segmentsConfig": {
       "schemaName": "metrics",
       "retentionTimeUnit": "DAYS",
       "retentionTimeValue": "365",
       "timeColumnName": "serve_time",
       "replication": "1",
       "replicasPerPartition": "1"
     },
   "tableIndexConfig": {
       "bloomFilterColumns": [
           "user_id"
         ],
        "sortedColumn": [
           "user_id"
         ],
       "segmentPartitionConfig": {
         "columnPartitionMap": {
           "user_id": {
             "functionName": "Murmur",
             "numPartitions": 8
           }
         }
       },
   "routing": {
     "segmentPrunerTypes": [
       "partition"
     ]
    }
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to