This is an automated email from the ASF dual-hosted git repository. laszlog pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git
commit df7aac9517bb3777f15e583100a087e4d3525ece Author: Gabor Kaszab <[email protected]> AuthorDate: Tue Apr 9 09:59:51 2024 +0200 IMPALA-12970: Fix ConcurrentModificationException for Iceberg table scans When a table is partitioned IcebergScanNode sorts the file descriptors for better scheduling. However, the list of file descriptors comes from IcebergContentFileStore and is shared between different select queries on the table. When another query tries to iterate the list of file descriptors and at the same time the IcebergScanNode sorts them we get a ConcurrentModificationException. To solve this IceberScanNode now creates its own copy of the file descriptor list not to interfere with other queries. Manual testing: 300-400 SELECT * Iceberg queries were sent into Impala in a loop that confidently reproduced the original issue. With the fix the issue is gone. The queries used for the repro: 1: select * from functional_parquet.iceberg_v2_partitioned_position_deletes_orc a, functional_parquet.iceberg_partitioned_orc_external b where a.action = b.action and b.id=3; 2: select * from functional_parquet.iceberg_v2_equality_delete_schema_evolution; Change-Id: Iafe57f05ffa0fa6a0875c141cfafd5ee1607a5c3 Reviewed-on: http://gerrit.cloudera.org:8080/21267 Reviewed-by: Impala Public Jenkins <[email protected]> Tested-by: Impala Public Jenkins <[email protected]> --- fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java b/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java index 90cce6edb..ad9928a3a 100644 --- a/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java +++ b/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java @@ -102,6 +102,9 @@ public class IcebergScanNode extends HdfsScanNode { if (((FeIcebergTable)tblRef.getTable()).isPartitioned()) { // Let's order the file descriptors for better scheduling. // See IMPALA-12765 for details. + // Create a clone of the original file descriptor list to avoid getting + // ConcurrentModificationException when sorting. + fileDescs_ = new ArrayList<>(fileDescs_); Collections.sort(fileDescs_); filesAreSorted_ = true; }
