[GitHub] [orc] pavibhai commented on a diff in pull request #1482: ORC-1413 fix for ORC row level filter issue with ACID table

via GitHub Mon, 01 May 2023 16:35:29 -0700


pavibhai commented on code in PR #1482:
URL: https://github.com/apache/orc/pull/1482#discussion_r1181965794



##########
java/core/src/java/org/apache/orc/impl/ParserUtils.java:
##########
@@ -459,10 +459,31 @@ public static ColumnVector[] 
findColumnVectors(TypeDescription schema,
                                                  boolean isCaseSensitive,
                                                  VectorizedRowBatch batch) {
     List<String> names = ParserUtils.splitName(source);
-    ColumnFinder result = new ColumnFinder(schema, batch, names.size());
+    ColumnFinder result = new ColumnFinder(schema, 
removeAcidColumnsFromVectorizedRowBatch(schema, batch), names.size());
     findColumn(removeAcid(schema), names, isCaseSensitive, result);
     return result.result;
   }
+  

Review Comment:
   I would think we should go a different route. My proposal would be something 
like.
   
   If the schema is acid schema then when you are looking up column names 
prefix them with `row.` and just use the full schema. This will ensure that we 
respect the nesting of the vectors under the struct column vector related to 
`row`



##########
java/core/src/test/org/apache/orc/TestOrcFilterContext.java:
##########
@@ -225,4 +230,12 @@ public void testRepeatingVector() {
     assertTrue(OrcFilterContext.isNull(vectorBranch, 1));
     assertTrue(OrcFilterContext.isNull(vectorBranch, 2));
   }
+  
+  @Test
+  public void testACIDTable(){

Review Comment:
   I would also add an actual filter test with data to ensure that the behavior 
is as expected.



##########
java/core/src/java/org/apache/orc/impl/ParserUtils.java:
##########
@@ -459,10 +459,31 @@ public static ColumnVector[] 
findColumnVectors(TypeDescription schema,
                                                  boolean isCaseSensitive,
                                                  VectorizedRowBatch batch) {
     List<String> names = ParserUtils.splitName(source);
-    ColumnFinder result = new ColumnFinder(schema, batch, names.size());
+    ColumnFinder result = new ColumnFinder(schema, 
removeAcidColumnsFromVectorizedRowBatch(schema, batch), names.size());
     findColumn(removeAcid(schema), names, isCaseSensitive, result);
     return result.result;
   }
+  
+  private static VectorizedRowBatch 
removeAcidColumnsFromVectorizedRowBatch(TypeDescription schema, 
VectorizedRowBatch batch) {
+    final int acidColumnOffset = 5;
+    if(SchemaEvolution.checkAcidSchema(schema)) {
+      VectorizedRowBatch vectorizedRowBatch = new 
VectorizedRowBatch(batch.cols.length - acidColumnOffset, batch.size);
+      System.arraycopy(batch.selected, 0, vectorizedRowBatch.selected, 0, 
vectorizedRowBatch.selected.length);
+      System.arraycopy(batch.projectedColumns, acidColumnOffset, 
vectorizedRowBatch.projectedColumns, 0, batch.projectedColumns.length - 
acidColumnOffset);
+      vectorizedRowBatch.setPartitionInfo(batch.getDataColumnCount(), 
batch.getPartitionColumnCount());
+      vectorizedRowBatch.selectedInUse = batch.selectedInUse;
+      vectorizedRowBatch.endOfFile = batch.endOfFile;
+
+      StructColumnVector structColumnVector = 
(StructColumnVector)(batch.cols[acidColumnOffset]);
+      ColumnVector[] columnVectors = new 
ColumnVector[structColumnVector.fields.length];
+      for (int i = 0; i < columnVectors.length; i++) {
+        columnVectors[i] = structColumnVector.fields[i];
+      }
+      vectorizedRowBatch.cols = columnVectors;
+      return vectorizedRowBatch;
+    }
+    return batch;

Review Comment:
   I might be missing the logic of where we are updating back the changes that 
are not referenced example batch.size back to the batch after the filter 
processing is complete.
   
   I go back to the earlier comment that we should do this differently.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [orc] pavibhai commented on a diff in pull request #1482: ORC-1413 fix for ORC row level filter issue with ACID table

Reply via email to