Jackie-Jiang commented on code in PR #11739:
URL: https://github.com/apache/pinot/pull/11739#discussion_r1372363740


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/JsonExtractIndexTransformFunction.java:
##########
@@ -0,0 +1,261 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.operator.transform.function;
+
+import java.math.BigDecimal;
+import java.util.List;
+import java.util.Map;
+import org.apache.pinot.core.operator.ColumnContext;
+import org.apache.pinot.core.operator.blocks.ValueBlock;
+import org.apache.pinot.core.operator.transform.TransformResultMetadata;
+import org.apache.pinot.segment.spi.index.reader.JsonIndexReader;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+import org.roaringbitmap.RoaringBitmap;
+
+
+/**
+ * The <code>JsonExtractIndexTransformFunction</code> provides the same 
behavior as JsonExtractScalar, with the
+ * implementation changed to read values from the JSON index. For large JSON 
blobs this can be faster than parsing
+ * GBs of JSON at query time. For small JSON blobs/highly filtered input this 
is generally slower than the *scalar
+ * implementation. The inflection point is highly dependent on the number of 
docs remaining post filter.
+ */
+public class JsonExtractIndexTransformFunction extends BaseTransformFunction {
+  public static final String FUNCTION_NAME = "jsonExtractIndex";
+
+  private TransformFunction _jsonFieldTransformFunction;
+  private String _jsonPathString;
+  private TransformResultMetadata _resultMetadata;
+  private JsonIndexReader _jsonIndexReader;
+  private Object _defaultValue;
+  private Map<String, RoaringBitmap> _jsonIndexReaderContext;

Review Comment:
   Rename to `_matchingDocsMap`



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/impl/json/MutableJsonIndexImpl.java:
##########
@@ -294,6 +296,52 @@ private RoaringBitmap getMatchingFlattenedDocIds(Predicate 
predicate) {
     }
   }
 
+  @Override
+  public Map<String, RoaringBitmap> getMatchingDocsMap(String key) {
+    Map<String, RoaringBitmap> cache = new HashMap<>();
+    _readLock.lock();
+    try {
+      for (Map.Entry<String, RoaringBitmap> entry : 
_postingListMap.entrySet()) {
+        if (!entry.getKey().startsWith(key + 
BaseJsonIndexCreator.KEY_VALUE_SEPARATOR)) {
+          continue;
+        }
+        MutableRoaringBitmap flattenedDocIds = 
entry.getValue().toMutableRoaringBitmap();
+        PeekableIntIterator it = flattenedDocIds.getIntIterator();
+        MutableRoaringBitmap postingList = new MutableRoaringBitmap();
+        while (it.hasNext()) {
+          postingList.add(_docIdMapping.getInt(it.next()));
+        }
+        String val = entry.getKey().substring(key.length() + 1);
+        cache.put(val, postingList.toRoaringBitmap());
+      }
+    } finally {
+      _readLock.unlock();
+    }
+    return cache;
+  }
+
+  @Override
+  public String[] getValuesForKeyAndDocs(int[] docIds, Map<String, 
RoaringBitmap> context) {

Review Comment:
   ```suggestion
     public String[] getValuesForKeyAndDocs(int[] docIds, Map<String, 
RoaringBitmap> matchingDocsMap) {
   ```



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/json/ImmutableJsonIndexReader.java:
##########
@@ -300,6 +305,70 @@ private int getDocId(int flattenedDocId) {
     return _docIdMapping.getInt((long) flattenedDocId << 2);
   }
 
+  @Override
+  public Map<String, RoaringBitmap> getMatchingDocsMap(String key) {

Review Comment:
   Same for this class



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/JsonExtractIndexTransformFunction.java:
##########
@@ -0,0 +1,261 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.operator.transform.function;
+
+import java.math.BigDecimal;
+import java.util.List;
+import java.util.Map;
+import org.apache.pinot.core.operator.ColumnContext;
+import org.apache.pinot.core.operator.blocks.ValueBlock;
+import org.apache.pinot.core.operator.transform.TransformResultMetadata;
+import org.apache.pinot.segment.spi.index.reader.JsonIndexReader;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+import org.roaringbitmap.RoaringBitmap;
+
+
+/**
+ * The <code>JsonExtractIndexTransformFunction</code> provides the same 
behavior as JsonExtractScalar, with the
+ * implementation changed to read values from the JSON index. For large JSON 
blobs this can be faster than parsing
+ * GBs of JSON at query time. For small JSON blobs/highly filtered input this 
is generally slower than the *scalar
+ * implementation. The inflection point is highly dependent on the number of 
docs remaining post filter.
+ */
+public class JsonExtractIndexTransformFunction extends BaseTransformFunction {
+  public static final String FUNCTION_NAME = "jsonExtractIndex";
+
+  private TransformFunction _jsonFieldTransformFunction;
+  private String _jsonPathString;
+  private TransformResultMetadata _resultMetadata;
+  private JsonIndexReader _jsonIndexReader;
+  private Object _defaultValue;
+  private Map<String, RoaringBitmap> _jsonIndexReaderContext;
+
+  @Override
+  public String getName() {
+    return FUNCTION_NAME;
+  }
+
+  @Override
+  public void init(List<TransformFunction> arguments, Map<String, 
ColumnContext> columnContextMap) {
+    // Check that there are exactly 3 or 4 arguments
+    if (arguments.size() < 3 || arguments.size() > 4) {
+      throw new IllegalArgumentException(
+          "Expected 3/4 arguments for transform function: 
jsonExtractIndex(jsonFieldName, 'jsonPath', 'resultsType',"
+              + " ['defaultValue'])");
+    }
+
+    TransformFunction firstArgument = arguments.get(0);
+    if (firstArgument instanceof IdentifierTransformFunction) {
+      String columnName = ((IdentifierTransformFunction) 
firstArgument).getColumnName();
+      _jsonIndexReader = 
columnContextMap.get(columnName).getDataSource().getJsonIndex();
+      if (_jsonIndexReader == null) {
+        throw new IllegalStateException("jsonExtractIndex can only be applied 
on a column with JSON index");
+      }
+    } else {
+      throw new IllegalArgumentException("jsonExtractIndex can only be applied 
to a raw column");
+    }
+    _jsonFieldTransformFunction = firstArgument;
+    TransformFunction secondArgument = arguments.get(1);
+    if (!(secondArgument instanceof LiteralTransformFunction)) {
+      throw new IllegalArgumentException("JSON path argument must be a 
literal");
+    }
+    _jsonPathString = ((LiteralTransformFunction) 
secondArgument).getStringLiteral().substring(1); // remove $ prefix

Review Comment:
   Consider validating the value



##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/reader/JsonIndexReader.java:
##########
@@ -31,4 +33,18 @@ public interface JsonIndexReader extends IndexReader {
    * Returns the matching document ids for the given filter.
    */
   MutableRoaringBitmap getMatchingDocIds(String filterString);
+
+  /**
+   * For an array of docIds and context specific to a JSON key, returns the 
corresponding values for each docId. The
+   * context should be created from the getMatchingDocsMap method.
+   *
+   * @return String[] where String[i] is the value for docIds[i]
+   */
+  String[] getValuesForKeyAndDocs(int[] docIds, Map<String, RoaringBitmap> 
context);

Review Comment:
   ```suggestion
     String[] getValuesForKeyAndDocs(int[] docIds, Map<String, RoaringBitmap> 
matchingDocsMap);
   ```



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/impl/json/MutableJsonIndexImpl.java:
##########
@@ -294,6 +296,52 @@ private RoaringBitmap getMatchingFlattenedDocIds(Predicate 
predicate) {
     }
   }
 
+  @Override
+  public Map<String, RoaringBitmap> getMatchingDocsMap(String key) {
+    Map<String, RoaringBitmap> cache = new HashMap<>();

Review Comment:
   Rename it to `matchingDocsMap`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to