lfernandez93 commented on issue #7642:
URL: https://github.com/apache/pinot/issues/7642#issuecomment-953099072
thanks @kishoreg this explains a lot then. basically if
`numEntriesScannedInFilter` is high means that selectivity is high and maybe we
need some more pruning that number definitely wouldn't correlate to
`numDocsScanned` which happens after we have done postFilter which will be the
final result(?).
I just got weird out cause as soon as I remove the `time` then the query is
much faster and I was wondering why if I'm trying to select even less data with
`time` I do see `numEntriesScannedInFilter` go to 0 after i remove the `time`
clause.
@richardstartin trace info
`
"traceInfo": {
"pinot-server-1.pinot-server-headless.pinot.svc.cluster.local":
"[{\"0\":[{\"GroupByOrderByCombineOperator
Time\":53},{\"InstanceResponseOperator
Time\":53}]},{\"0_0\":[{\"SortedIndexBasedFilterOperator
Time\":0},{\"DocIdSetOperator Time\":0},{\"ProjectionOperator
Time\":0},{\"PassThroughTransformOperator Time\":0},{\"DocIdSetOperator
Time\":0},{\"ProjectionOperator Time\":0},{\"PassThroughTransformOperator
Time\":0},{\"AggregationGroupByOrderByOperator
Time\":0}]},{\"0_1\":[{\"SortedIndexBasedFilterOperator
Time\":0},{\"ScanBasedFilterOperator Time\":0},{\"AndFilterOperator
Time\":0},{\"DocIdSetOperator Time\":0},{\"ProjectionOperator
Time\":0},{\"PassThroughTransformOperator Time\":0},{\"DocIdSetOperator
Time\":0},{\"ProjectionOperator Time\":0},{\"PassThroughTransformOperator
Time\":0},{\"AggregationGroupByOrderByOperator
Time\":0}]},{\"0_2\":[{\"SortedIndexBasedFilterOperator
Time\":0},{\"RangeFilterOperator Time\":53},{\"AndFilterOperator
Time\":53},{\"DocIdSetOperato
r Time\":53},{\"ProjectionOperator Time\":53},{\"PassThroughTransformOperator
Time\":53},{\"AggregationGroupByOrderByOperator Time\":53}]}]"
},
`
debug info
```json
"numServersQueried": 1,
"numServersResponded": 1,
"numSegmentsQueried": 29,
"numSegmentsProcessed": 3,
"numSegmentsMatched": 2,
"numConsumingSegmentsQueried": 1,
"numDocsScanned": 84,
"numEntriesScannedInFilter": 1169794,
"numEntriesScannedPostFilter": 84,
"numGroupsLimitReached": false,
"totalDocs": 150832761,
"timeUsedMs": 57,
"offlineThreadCpuTimeNs": 0,
"realtimeThreadCpuTimeNs": 53524933,
"numRowsResultSet": 10,
"minConsumingFreshnessTimeMs": 1635351812064
```
table configs (`time` is `serve_time` `id` is `user_id`):
```json
"segmentsConfig": {
"schemaName": "ads_metrics",
"retentionTimeUnit": "DAYS",
"retentionTimeValue": "365",
"timeColumnName": "serve_time",
"replication": "1",
"replicasPerPartition": "1"
},
"tableIndexConfig": {
"bloomFilterColumns": [
"user_id"
],
"sortedColumn": [
"user_id"
],
"segmentPartitionConfig": {
"columnPartitionMap": {
"user_id": {
"functionName": "Murmur",
"numPartitions": 8
}
}
},
"routing": {
"segmentPrunerTypes": [
"partition"
]
}
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]