omalley commented on a change in pull request #716:
URL: https://github.com/apache/orc/pull/716#discussion_r681236104



##########
File path: java/core/src/gen/filters/string_eq.txt
##########
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.orc.filter.impl;
+
+import org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.ColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.expressions.StringExpr;
+import org.apache.orc.filter.impl.LeafFilter;
+
+import java.nio.charset.StandardCharsets;
+
+// This is generated from string_eq.txt
+public class <ClassName> extends LeafFilter {

Review comment:
       All of the leaf classes should be package local.

##########
File path: java/core/pom.xml
##########
@@ -114,6 +119,10 @@
     <dependency>
       <groupId>net.bytebuddy</groupId>
       <artifactId>byte-buddy</artifactId>
+    </dependency>

Review comment:
       I think this is not required by the patch.

##########
File path: java/core/src/gen/filters/type_in.txt
##########
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.orc.filter.impl;
+
+import org.apache.hadoop.hive.ql.exec.vector.ColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.<LeafVector>;
+import org.apache.orc.filter.impl.LeafFilter;
+
+import java.util.Arrays;
+import java.util.List;
+
+// This is generated from type_in.txt
+public class <ClassName> extends LeafFilter {
+  public final <LeafType>[] inValues;

Review comment:
       We should have all of the fields as private.

##########
File path: java/core/pom.xml
##########
@@ -37,6 +37,11 @@
       <groupId>org.apache.orc</groupId>
       <artifactId>orc-shims</artifactId>
     </dependency>
+    <dependency>

Review comment:
       I'm not convinced that the generator is worth the added complexity. I 
found myself generating the code and reviewing that instead.
   If we do keep the generator, I'd suggest a much more specific name like 
"filter-codegen".

##########
File path: java/core/pom.xml
##########
@@ -37,6 +37,11 @@
       <groupId>org.apache.orc</groupId>
       <artifactId>orc-shims</artifactId>
     </dependency>
+    <dependency>

Review comment:
       The other advantage to having the generated code is that the code 
refactoring and style check tools work on it.

##########
File path: java/core/src/java/org/apache/orc/filter/impl/OrFilter.java
##########
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.orc.filter.impl;
+
+import org.apache.orc.OrcFilterContext;
+import org.apache.orc.filter.VectorFilter;
+
+public class OrFilter implements VectorFilter {
+
+  public final VectorFilter[] filters;
+  private final Selected orOut = new Selected();
+
+  public OrFilter(VectorFilter[] filters) {
+    this.filters = filters;
+  }
+
+  public static void merge(Selected src, Selected tgt) {

Review comment:
       This should be moved to Selected and renamed to unionDistinct.

##########
File path: java/core/src/java/org/apache/orc/filter/VectorFilter.java
##########
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.orc.filter;
+
+import org.apache.orc.OrcFilterContext;
+import org.apache.orc.filter.impl.Selected;
+
+/**
+ * A filter that operates on the supplied
+ * {@link org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch} and 
updates the selections.
+ *
+ * This is the interface that is the basis of both the leaf filters such as 
Equals, In and logical
+ * filters such as And, Or and Not
+ */
+public interface VectorFilter {
+
+  /**
+   * Filter the vectorized row batch that is wrapped into the FilterContext.
+   * @param fc     The filter context that contains the VectorizedRowBatch
+   * @param bound  The bound of the scan
+   * @param selIn  The current selection
+   * @param selOut The result selection
+   */
+  void filter(OrcFilterContext fc, Selected bound, Selected selIn, Selected 
selOut);

Review comment:
       I'd propose that we should join bound and selIn to be the rows that 
should be checked. The documentation should make it clear that bound should not 
be modified. Furthermore, we document that items in selOut that are not in 
bound must be retained. The selOut vector must be sorted.

##########
File path: java/core/src/java/org/apache/orc/filter/impl/LeafFilter.java
##########
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.orc.filter.impl;
+
+import org.apache.hadoop.hive.ql.exec.vector.ColumnVector;
+import org.apache.orc.OrcFilterContext;
+import org.apache.orc.filter.VectorFilter;
+
+public abstract class LeafFilter implements VectorFilter {

Review comment:
       I'd suggest putting a boolean in to LeafFilter that is whether the 
filter is negated. It can be used on the calls to accept at very low cost.

##########
File path: java/core/pom.xml
##########
@@ -37,6 +37,11 @@
       <groupId>org.apache.orc</groupId>
       <artifactId>orc-shims</artifactId>
     </dependency>
+    <dependency>

Review comment:
       I pushed the results of removing the code generation in a 
[fork](https://github.com/omalley/orc/tree/orc-743) of this branch.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to