This is an automated email from the ASF dual-hosted git repository.

laszlog pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit df7aac9517bb3777f15e583100a087e4d3525ece
Author: Gabor Kaszab <[email protected]>
AuthorDate: Tue Apr 9 09:59:51 2024 +0200

    IMPALA-12970: Fix ConcurrentModificationException for Iceberg table scans
    
    When a table is partitioned IcebergScanNode sorts the file descriptors
    for better scheduling. However, the list of file descriptors comes from
    IcebergContentFileStore and is shared between different select queries
    on the table. When another query tries to iterate the list of file
    descriptors and at the same time the IcebergScanNode sorts them we get
    a ConcurrentModificationException.
    To solve this IceberScanNode now creates its own copy of the file
    descriptor list not to interfere with other queries.
    
    Manual testing:
    300-400 SELECT * Iceberg queries were sent into Impala in a loop that
    confidently reproduced the original issue. With the fix the issue is
    gone.
    The queries used for the repro:
    1:
    select *
    from functional_parquet.iceberg_v2_partitioned_position_deletes_orc a,
    functional_parquet.iceberg_partitioned_orc_external b
    where a.action = b.action and b.id=3;
    2:
    select *
    from functional_parquet.iceberg_v2_equality_delete_schema_evolution;
    
    Change-Id: Iafe57f05ffa0fa6a0875c141cfafd5ee1607a5c3
    Reviewed-on: http://gerrit.cloudera.org:8080/21267
    Reviewed-by: Impala Public Jenkins <[email protected]>
    Tested-by: Impala Public Jenkins <[email protected]>
---
 fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java 
b/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
index 90cce6edb..ad9928a3a 100644
--- a/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
+++ b/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
@@ -102,6 +102,9 @@ public class IcebergScanNode extends HdfsScanNode {
     if (((FeIcebergTable)tblRef.getTable()).isPartitioned()) {
       // Let's order the file descriptors for better scheduling.
       // See IMPALA-12765 for details.
+      // Create a clone of the original file descriptor list to avoid getting
+      // ConcurrentModificationException when sorting.
+      fileDescs_ = new ArrayList<>(fileDescs_);
       Collections.sort(fileDescs_);
       filesAreSorted_ = true;
     }

Reply via email to