Copilot commented on code in PR #16478:
URL: https://github.com/apache/pinot/pull/16478#discussion_r2246479840
##########
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/predicate/RegexpLikePredicateEvaluatorFactory.java:
##########
@@ -32,6 +36,9 @@ public class RegexpLikePredicateEvaluatorFactory {
private RegexpLikePredicateEvaluatorFactory() {
}
+ /// When the cardinality of the dictionary is less than this threshold, scan
the dictionary to get the matching ids.
Review Comment:
Use standard Java comment syntax `//` instead of `///` for single-line
comments.
```suggestion
// When the cardinality of the dictionary is less than this threshold,
scan the dictionary to get the matching ids.
```
##########
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/predicate/RegexpLikePredicateEvaluatorFactory.java:
##########
@@ -56,16 +66,64 @@ public static BaseDictionaryBasedPredicateEvaluator
newDictionaryBasedEvaluator(
*/
public static BaseRawValueBasedPredicateEvaluator
newRawValueBasedEvaluator(RegexpLikePredicate regexpLikePredicate,
DataType dataType) {
- Preconditions.checkArgument(dataType == DataType.STRING, "Unsupported data
type: " + dataType);
+ Preconditions.checkArgument(dataType.getStoredType() == DataType.STRING,
"Unsupported data type: " + dataType);
return new RawValueBasedRegexpLikePredicateEvaluator(regexpLikePredicate);
}
- private static final class DictionaryBasedRegexpLikePredicateEvaluator
extends BaseDictionaryBasedPredicateEvaluator {
+ private static final class DictIdBasedRegexpLikePredicateEvaluator
+ extends BaseDictIdBasedRegexpLikePredicateEvaluator {
+ final IntSet _matchingDictIdSet;
+
+ public DictIdBasedRegexpLikePredicateEvaluator(RegexpLikePredicate
regexpLikePredicate, Dictionary dictionary) {
+ super(regexpLikePredicate, dictionary);
+ Matcher matcher = regexpLikePredicate.getPattern().matcher("");
+ IntList matchingDictIds = new IntArrayList();
+ int dictionarySize = _dictionary.length();
+ for (int dictId = 0; dictId < dictionarySize; dictId++) {
+ if (matcher.reset(dictionary.getStringValue(dictId)).find()) {
+ matchingDictIds.add(dictId);
+ }
+ }
+ int numMatchingDictIds = matchingDictIds.size();
+ if (numMatchingDictIds == 0) {
+ _alwaysFalse = true;
+ } else if (dictionarySize == numMatchingDictIds) {
+ _alwaysTrue = true;
+ }
+ _matchingDictIds = matchingDictIds.toIntArray();
+ _matchingDictIdSet = new IntOpenHashSet(_matchingDictIds);
+ }
+
+ @Override
+ public int getNumMatchingItems() {
+ return _matchingDictIdSet.size();
+ }
+
+ @Override
+ public boolean applySV(int dictId) {
+ return _matchingDictIdSet.contains(dictId);
+ }
+
+ @Override
+ public int applySV(int limit, int[] docIds, int[] values) {
+ // reimplemented here to ensure applySV can be inlined
Review Comment:
The comment 'reimplemented here to ensure applySV can be inlined' should be
more specific about why this reimplementation is necessary and what performance
benefit it provides.
```suggestion
// This method is reimplemented to ensure it can be inlined by the JVM
for performance optimization.
// Inlining allows the JVM to eliminate method call overhead and apply
further optimizations such as loop unrolling.
// This is particularly beneficial in scenarios where this method is
called frequently during query execution,
// as it reduces runtime overhead and improves overall query
performance.
```
##########
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/OfflineClusterIntegrationTest.java:
##########
@@ -2209,6 +2212,22 @@ private void testNewAddedColumns()
assertEquals(row.get(12).asDouble(), 0.0);
}
+ private void testRegexpLikeOnNewAddedColumns()
+ throws Exception {
+ int numTotalDocs = (int) getCountStarResult();
+
+ // REGEXP_LIKE on new added dictionary-encoded columns should not scan the
table when it matches all or nothing
Review Comment:
Change 'new added' to 'newly added' for correct grammar.
```suggestion
// REGEXP_LIKE on newly added dictionary-encoded columns should not scan
the table when it matches all or nothing
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]