GSharayu commented on a change in pull request #6776:
URL: https://github.com/apache/incubator-pinot/pull/6776#discussion_r612032042



##########
File path: 
pinot-core/src/main/java/org/apache/pinot/core/query/pruner/ColumnValueSegmentPruner.java
##########
@@ -238,6 +237,66 @@ private boolean pruneRangePredicate(IndexSegment segment, 
RangePredicate rangePr
     return false;
   }
 
+  /**
+   * For IN predicate, segment will not be pruned if the size of values is 
greater than threshold
+   * Prune the segment based on:
+   * <ul>
+   *   <li>Column min/max value</li>
+   * </ul>
+   * Returns:
+   * <ul>
+   *   <li> true if segment can be pruned </li>
+   *   <li> false if size of values > threshold or any of the value is greater 
than min value or smaller than max value of segment</li>
+   * </ul>
+   */
+  private boolean pruneInPredicate(IndexSegment segment, InPredicate 
inPredicate, Map<String, DataSource> dataSourceCache) {
+    String column = inPredicate.getLhs().getIdentifier();
+    DataSource dataSource = dataSourceCache.computeIfAbsent(column, 
segment::getDataSource);
+    // NOTE: Column must exist after DataSchemaSegmentPruner
+    assert dataSource != null;
+    DataSourceMetadata dataSourceMetadata = dataSource.getDataSourceMetadata();
+    List<String> values = inPredicate.getValues();
+    //check max threshold value
+    if (values.size() > _inPredicateThreshold) {
+      return false;
+    }
+
+    for (String value : values) {
+      Comparable inValue = convertValue(value, 
dataSourceMetadata.getDataType());
+      if (!checkMinMaxRange(dataSourceMetadata, inValue)) {
+        return false;
+      }
+    }
+    return true;
+  }
+
+  /**
+   * Check if the comparable value is within min/max range
+   * <ul>
+   *   <li>Column min/max value</li>
+   * </ul>
+   * Returns:
+   * <ul>
+   *   <li> true if the value is smaller than min value or value is greater 
than max value</li>
+   *   <li> false if the value is greater than min value or value is smaller 
than max value</li>
+   * </ul>
+   */
+  private boolean checkMinMaxRange(DataSourceMetadata dataSourceMetadata, 
Comparable value) {

Review comment:
       done! for part 1. For the optional part, how do you suggest to do that 
as the function might be needed to be split into separate functions for 
minValue and maxValue check independently for cleaner approach?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to