GSharayu commented on a change in pull request #6776:
URL: https://github.com/apache/incubator-pinot/pull/6776#discussion_r612032042
##########
File path:
pinot-core/src/main/java/org/apache/pinot/core/query/pruner/ColumnValueSegmentPruner.java
##########
@@ -238,6 +237,66 @@ private boolean pruneRangePredicate(IndexSegment segment,
RangePredicate rangePr
return false;
}
+ /**
+ * For IN predicate, segment will not be pruned if the size of values is
greater than threshold
+ * Prune the segment based on:
+ * <ul>
+ * <li>Column min/max value</li>
+ * </ul>
+ * Returns:
+ * <ul>
+ * <li> true if segment can be pruned </li>
+ * <li> false if size of values > threshold or any of the value is greater
than min value or smaller than max value of segment</li>
+ * </ul>
+ */
+ private boolean pruneInPredicate(IndexSegment segment, InPredicate
inPredicate, Map<String, DataSource> dataSourceCache) {
+ String column = inPredicate.getLhs().getIdentifier();
+ DataSource dataSource = dataSourceCache.computeIfAbsent(column,
segment::getDataSource);
+ // NOTE: Column must exist after DataSchemaSegmentPruner
+ assert dataSource != null;
+ DataSourceMetadata dataSourceMetadata = dataSource.getDataSourceMetadata();
+ List<String> values = inPredicate.getValues();
+ //check max threshold value
+ if (values.size() > _inPredicateThreshold) {
+ return false;
+ }
+
+ for (String value : values) {
+ Comparable inValue = convertValue(value,
dataSourceMetadata.getDataType());
+ if (!checkMinMaxRange(dataSourceMetadata, inValue)) {
+ return false;
+ }
+ }
+ return true;
+ }
+
+ /**
+ * Check if the comparable value is within min/max range
+ * <ul>
+ * <li>Column min/max value</li>
+ * </ul>
+ * Returns:
+ * <ul>
+ * <li> true if the value is smaller than min value or value is greater
than max value</li>
+ * <li> false if the value is greater than min value or value is smaller
than max value</li>
+ * </ul>
+ */
+ private boolean checkMinMaxRange(DataSourceMetadata dataSourceMetadata,
Comparable value) {
Review comment:
done! for part 1. For the optional part, how do you suggest to do that
as the function might be needed to be split into separate functions for
minValue and maxValue check independently for cleaner approach?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]