[jira] [Work logged] (HIVE-26884) Iceberg: V2 Vectorization returns wrong results with deletes

ASF GitHub Bot (Jira) Thu, 22 Dec 2022 12:08:04 -0800


     [ 
https://issues.apache.org/jira/browse/HIVE-26884?focusedWorklogId=835401&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-835401
 ]


ASF GitHub Bot logged work on HIVE-26884:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 22/Dec/22 20:07
            Start Date: 22/Dec/22 20:07
    Worklog Time Spent: 10m 
      Work Description: deniskuzZ commented on code in PR #3890:
URL: https://github.com/apache/hive/pull/3890#discussion_r1055812387


##########
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/vector/TestHiveIcebergVectorization.java:
##########
@@ -188,6 +191,46 @@ public void testHiveDeleteFilter() {
     validation.apply(1501);
   }
 
+  @Test
+  public void testHiveDeleteFilterWithFilteredParquetBlock() {
+    Assume.assumeTrue(
+        isVectorized && testTableType == TestTables.TestTableType.HIVE_CATALOG 
&& fileFormat == FileFormat.PARQUET);
+
+    Schema schema = new Schema(
+        optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "customer_age", Types.IntegerType.get()),
+        optional(3, "date_col", Types.DateType.get())
+    );
+
+    // Generate 10600 records so that we end up with multiple batches to work 
with during the read.
+    List<Record> records = TestHelper.generateRandomRecords(schema, 10600, 0L);
+
+    // Fill id and date column with deterministic values
+    for (int i = 0; i < records.size(); ++i) {
+      records.get(i).setField("customer_id", (long) i);
+      if (i % 3 == 0) {
+        records.get(i).setField("date_col", Date.valueOf("2022-04-28"));
+      } else if (i % 3 == 1) {
+        records.get(i).setField("date_col", Date.valueOf("2022-04-29"));
+      } else {
+        records.get(i).setField("date_col", Date.valueOf("2022-04-30"));
+      }
+    }
+    Map<String, String> props = Maps.newHashMap();
+    props.put("parquet.block.size", "8192");
+    testTables.createTable(shell, "vectordelete", schema, 
PartitionSpec.unpartitioned(), fileFormat, records, 2, props);
+
+    // Check there is some rows before we do an update
+    List<Object[]> results = shell.executeStatement("select * from 
vectordelete where date_col=date'2022-04-29'");
+
+    Assert.assertNotEquals(0, results.size());
+
+    // Do an update on the column, and check if the count is 0, since we 
changed the value for that column
+    shell.executeStatement("update vectordelete set date_col=date'2022-04-30' 
where date_col=date'2022-04-29'");
+    results = shell.executeStatement("select * from vectordelete where 
date_col=date'2022-04-29'");

Review Comment:
   could we also verify count for date_col=date'2022-04-30'?





Issue Time Tracking
-------------------

    Worklog Id:     (was: 835401)
    Time Spent: 40m  (was: 0.5h)

> Iceberg: V2 Vectorization returns wrong results with deletes
> ------------------------------------------------------------
>
>                 Key: HIVE-26884
>                 URL: https://issues.apache.org/jira/browse/HIVE-26884
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Ayush Saxena
>            Assignee: Ayush Saxena
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> In case of Iceberg V2 reads, if we have delete files, and a couple of parquet 
> blocks are skipped in that case the row number calculation is screwed and 
> that leads to mismatch with delete filter row positions and hence leading to 
> wrong results.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26884) Iceberg: V2 Vectorization returns wrong results with deletes

Reply via email to