[GitHub] [orc] pavibhai commented on a change in pull request #635: ORC-742: LazyIO for non-filter columns

GitBox Thu, 18 Feb 2021 15:46:57 -0800


pavibhai commented on a change in pull request #635:
URL: https://github.com/apache/orc/pull/635#discussion_r578826252




##########
File path: java/core/src/java/org/apache/orc/filter/OrcFilterContext.java
##########
@@ -0,0 +1,169 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.orc.filter;
+
+import org.apache.hadoop.hive.ql.exec.vector.ColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.StructColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
+import org.apache.hadoop.hive.ql.io.filter.MutableFilterContext;
+import org.apache.orc.TypeDescription;
+import org.jetbrains.annotations.NotNull;
+
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.regex.Pattern;
+
+/**
+ * This defines the input for any filter operation.
+ * <p>
+ * It offers a convenience method for finding the column vector from a given 
name, that the filters
+ * can invoke to get access to the column vector.
+ */
+public class OrcFilterContext implements MutableFilterContext {
+
+  VectorizedRowBatch batch = null;
+  // Cache of field to ColumnVector, this is reset everytime the batch 
reference changes
+  private final Map<String, ColumnVector> vectors;
+  private final TypeDescription readSchema;
+
+  public OrcFilterContext(TypeDescription readSchema) {
+    this.readSchema = readSchema;
+    this.vectors = new HashMap<>();
+  }
+
+  public OrcFilterContext setBatch(@NotNull VectorizedRowBatch batch) {
+    if (batch != this.batch) {
+      this.batch = batch;
+      vectors.clear();
+    }
+    return this;
+  }
+
+  @Override
+  public void setFilterContext(boolean selectedInUse, int[] selected, int 
selectedSize) {
+    batch.setFilterContext(selectedInUse, selected, selectedSize);
+  }
+
+  @Override
+  public boolean validateSelected() {
+    return batch.validateSelected();
+  }
+
+  @Override
+  public int[] updateSelected(int i) {
+    return batch.updateSelected(i);
+  }
+
+  @Override
+  public void setSelectedInUse(boolean b) {
+    batch.setSelectedInUse(b);
+  }
+
+  @Override
+  public void setSelected(int[] ints) {
+    batch.setSelected(ints);
+  }
+
+  @Override
+  public void setSelectedSize(int i) {
+    batch.setSelectedSize(i);
+  }
+
+  @Override
+  public void reset() {
+    batch.reset();
+  }
+
+  @Override
+  public boolean isSelectedInUse() {
+    return batch.isSelectedInUse();
+  }
+
+  @Override
+  public int[] getSelected() {
+    return batch.getSelected();
+  }
+
+  @Override
+  public int getSelectedSize() {
+    return batch.getSelectedSize();
+  }
+
+  public ColumnVector[] getCols() {
+    return batch.cols;
+  }
+
+  /**
+   * Retrieves the column vector that matches the specified name. Allows 
support for nested struct
+   * references e.g. order.date where data is a field in a struct called order.
+   *
+   * @param name The column name whose vector should be retrieved
+   * @return The column vector
+   * @throws IllegalArgumentException if the field is not found or if the 
nested field is not part
+   *                                  of a struct
+   */
+  public ColumnVector findColumnVector(String name) {
+    if (!vectors.containsKey(name)) {
+      vectors.put(name, findVector(name));
+    }
+
+    return vectors.get(name);
+  }
+
+  private ColumnVector findVector(String name) {
+    String[] refs = name.split(SPLIT_ON_PERIOD);

Review comment:
       Maybe I am misinterpreting your comment. `RecordReaderImpl.findColumns` 
gives me id, but I need index to determine the column vector in the row batch.
   
   I can use the ParserUtils.splitName instead of custom split.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [orc] pavibhai commented on a change in pull request #635: ORC-742: LazyIO for non-filter columns

Reply via email to