[GitHub] [orc] cxzl25 commented on a diff in pull request #1244: ORC-1266: DecimalConvertTreeReader should be reset vector before nextVector

GitBox Fri, 09 Sep 2022 01:50:19 -0700


cxzl25 commented on code in PR #1244:
URL: https://github.com/apache/orc/pull/1244#discussion_r966794471



##########
java/core/src/test/org/apache/orc/impl/TestConvertTreeReaderFactory.java:
##########
@@ -639,4 +641,94 @@ private void testConvertToDateIncreasingSize() throws 
Exception {
   private void testConvertToBinaryIncreasingSize() throws Exception {
     readORCFileIncreasingBatchSize("binary", BytesColumnVector.class);
   }
+
+  @Test
+  public void testDecimalConvertInNullStripe() throws Exception {
+    try {
+      Configuration decimalConf = new Configuration(conf);
+      decimalConf.set(OrcConf.STRIPE_ROW_COUNT.getAttribute(), "1024");
+      decimalConf.set(OrcConf.ROWS_BETWEEN_CHECKS.getAttribute(), "1");
+
+      String typeStr = "decimal(5,1)";
+      TypeDescription schema = TypeDescription.fromString("struct<col1:" + 
typeStr + ">");
+      Writer w = OrcFile.createWriter(testFilePath, 
OrcFile.writerOptions(decimalConf).setSchema(schema));
+
+      VectorizedRowBatch b = schema.createRowBatch();
+      DecimalColumnVector f1 = (DecimalColumnVector) b.cols[0];
+      f1.isRepeating = true;
+      f1.set(0, (HiveDecimal) null);
+      b.size = 1024;
+      w.addRowBatch(b);
+      b.reset();
+      for (int i = 0; i < 1024; i++) {
+        f1.set(i, HiveDecimal.create(i + 1));
+      }
+      b.size = 1024;
+      w.addRowBatch(b);
+      b.reset();
+      w.close();
+
+      testDecimalConvertToLongInNullStripe();
+      testDecimalConvertToDoubleInNullStripe();
+      testDecimalConvertToStringInNullStripe();
+      testDecimalConvertToTimestampInNullStripe();
+      testDecimalConvertToDecimalInNullStripe();
+    } finally {
+      fs.delete(testFilePath, false);
+    }
+  }
+
+  private void readDecimalInNullStripe(String typeString, Class<?> 
expectedColumnType,
+      String expectedResult) throws Exception {
+    Reader.Options options = new Reader.Options();
+    TypeDescription schema = TypeDescription.fromString("struct<col1:" + 
typeString + ">");
+    options.schema(schema);
+    String expected = options.toString();
+
+    Configuration conf = new Configuration();
+
+    Reader reader = OrcFile.createReader(testFilePath, 
OrcFile.readerOptions(conf));
+    RecordReader rows = reader.rows(options);
+    VectorizedRowBatch batch = schema.createRowBatch();
+
+    rows.nextBatch(batch);
+    assertEquals(1024, batch.size);
+    assertEquals(expected, options.toString());
+    assertEquals(batch.cols.length, 1);
+    assertEquals(batch.cols[0].getClass(), expectedColumnType);
+    assertTrue(batch.cols[0].isRepeating);
+    StringBuilder sb = new StringBuilder();
+    batch.cols[0].stringifyValue(sb, 1023);
+    assertEquals(sb.toString(), "null");
+
+    rows.nextBatch(batch);
+    assertEquals(1024, batch.size);
+    assertEquals(expected, options.toString());
+    assertEquals(batch.cols.length, 1);
+    assertEquals(batch.cols[0].getClass(), expectedColumnType);
+    assertFalse(batch.cols[0].isRepeating);

Review Comment:
   isRepeating is true, which causes `ConvertTreeReader#convertVector` to only 
retain the first data of the entire batch, and the rest of the data are 
ignored, resulting in incorrect results.
   In this case, 1024 entries are all 1.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [orc] cxzl25 commented on a diff in pull request #1244: ORC-1266: DecimalConvertTreeReader should be reset vector before nextVector

Reply via email to