[ 
https://issues.apache.org/jira/browse/IMPALA-12970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17835884#comment-17835884
 ] 

ASF subversion and git services commented on IMPALA-12970:
----------------------------------------------------------

Commit df7aac9517bb3777f15e583100a087e4d3525ece in impala's branch 
refs/heads/master from Gabor Kaszab
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=df7aac951 ]

IMPALA-12970: Fix ConcurrentModificationException for Iceberg table scans

When a table is partitioned IcebergScanNode sorts the file descriptors
for better scheduling. However, the list of file descriptors comes from
IcebergContentFileStore and is shared between different select queries
on the table. When another query tries to iterate the list of file
descriptors and at the same time the IcebergScanNode sorts them we get
a ConcurrentModificationException.
To solve this IceberScanNode now creates its own copy of the file
descriptor list not to interfere with other queries.

Manual testing:
300-400 SELECT * Iceberg queries were sent into Impala in a loop that
confidently reproduced the original issue. With the fix the issue is
gone.
The queries used for the repro:
1:
select *
from functional_parquet.iceberg_v2_partitioned_position_deletes_orc a,
functional_parquet.iceberg_partitioned_orc_external b
where a.action = b.action and b.id=3;
2:
select *
from functional_parquet.iceberg_v2_equality_delete_schema_evolution;

Change-Id: Iafe57f05ffa0fa6a0875c141cfafd5ee1607a5c3
Reviewed-on: http://gerrit.cloudera.org:8080/21267
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Test failure at test_read_equality_deletes in test_iceberg in exhaustive build
> ------------------------------------------------------------------------------
>
>                 Key: IMPALA-12970
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12970
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>            Reporter: Yida Wu
>            Assignee: Gabor Kaszab
>            Priority: Major
>              Labels: broken-build
>
> An error is observed in the data-cache exhaustive build in 
> test_read_equality_deletes with following message:
> {code:java}
> query_test.test_iceberg.TestIcebergV2Table.test_read_equality_deletes[protocol:
>  beeswax | table_format: parquet/none | exec_option: {'test_replan': 1, 
> 'disable_optimized_iceberg_v2_read': 1, 'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0}] (from pytest)
> {code}
> *Error Message*
> {code:java}
> query_test/test_iceberg.py:1456: in test_read_equality_deletes     
> self.run_test_case('QueryTest/iceberg-v2-read-equality-deletes', vector) 
> common/impala_test_suite.py:725: in run_test_case     result = exec_fn(query, 
> user=test_section.get('USER', '').strip() or None) 
> common/impala_test_suite.py:660: in __exec_in_impala     result = 
> self.__execute_query(target_impalad_client, query, user=user) 
> common/impala_test_suite.py:1013: in __execute_query     return 
> impalad_client.execute(query, user=user) common/impala_connection.py:215: in 
> execute     fetch_profile_after_close=fetch_profile_after_close) 
> beeswax/impala_beeswax.py:191: in execute     handle = 
> self.__execute_query(query_string.strip(), user=user) 
> beeswax/impala_beeswax.py:382: in __execute_query     handle = 
> self.execute_query_async(query_string, user=user) 
> beeswax/impala_beeswax.py:376: in execute_query_async     handle = 
> self.__do_rpc(lambda: self.imp_service.query(query,)) 
> beeswax/impala_beeswax.py:539: in __do_rpc     raise 
> ImpalaBeeswaxException(self.__build_error_message(b), b) E   
> ImpalaBeeswaxException: ImpalaBeeswaxException: E    INNER EXCEPTION: <class 
> 'beeswaxd.ttypes.BeeswaxException'> E    MESSAGE: 
> ConcurrentModificationException: null
> {code}
> *Stacktrace*
> {code:java}
> query_test/test_iceberg.py:1456: in test_read_equality_deletes
>     self.run_test_case('QueryTest/iceberg-v2-read-equality-deletes', vector)
> common/impala_test_suite.py:725: in run_test_case
>     result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
> common/impala_test_suite.py:660: in __exec_in_impala
>     result = self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:1013: in __execute_query
>     return impalad_client.execute(query, user=user)
> common/impala_connection.py:215: in execute
>     fetch_profile_after_close=fetch_profile_after_close)
> beeswax/impala_beeswax.py:191: in execute
>     handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:382: in __execute_query
>     handle = self.execute_query_async(query_string, user=user)
> beeswax/impala_beeswax.py:376: in execute_query_async
>     handle = self.__do_rpc(lambda: self.imp_service.query(query,))
> beeswax/impala_beeswax.py:539: in __do_rpc
>     raise ImpalaBeeswaxException(self.__build_error_message(b), b)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> E    INNER EXCEPTION: <class 'beeswaxd.ttypes.BeeswaxException'>
> E    MESSAGE: ConcurrentModificationException: null
> {code}
> *Standard Error*
> {code:java}
> SET 
> client_identifier=query_test/test_iceberg.py::TestIcebergV2Table::()::test_read_equality_deletes[protocol:beeswax|table_format:parquet/none|exec_option:{'test_replan':1;'disable_optimized_iceberg_v2_read':1;'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'d;
> -- connecting to: localhost:21000
> -- 2024-04-03 07:04:53,469 INFO     MainThread: Could not connect to ('::1', 
> 21000, 0, 0)
> Traceback (most recent call last):
>   File 
> "/data/jenkins/workspace/impala-asf-master-exhaustive-data-cache/repos/Impala/infra/python/env-gcc10.4.0/lib/python2.7/site-packages/thrift/transport/TSocket.py",
>  line 137, in open
>     handle.connect(sockaddr)
>   File 
> "/data/jenkins/workspace/impala-asf-master-exhaustive-data-cache/Impala-Toolchain/toolchain-packages-gcc10.4.0/python-2.7.16/lib/python2.7/socket.py",
>  line 228, in meth
>     return getattr(self._sock,name)(*args)
> error: [Errno 111] Connection refused
> -- connecting to localhost:21050 with impyla
> -- 2024-04-03 07:04:53,469 INFO     MainThread: Could not connect to ('::1', 
> 21050, 0, 0)
> Traceback (most recent call last):
>   File 
> "/data/jenkins/workspace/impala-asf-master-exhaustive-data-cache/repos/Impala/infra/python/env-gcc10.4.0/lib/python2.7/site-packages/thrift/transport/TSocket.py",
>  line 137, in open
>     handle.connect(sockaddr)
>   File 
> "/data/jenkins/workspace/impala-asf-master-exhaustive-data-cache/Impala-Toolchain/toolchain-packages-gcc10.4.0/python-2.7.16/lib/python2.7/socket.py",
>  line 228, in meth
>     return getattr(self._sock,name)(*args)
> error: [Errno 111] Connection refused
> -- 2024-04-03 07:04:53,481 INFO     MainThread: Closing active operation
> -- connecting to localhost:28000 with impyla
> -- 2024-04-03 07:04:53,497 INFO     MainThread: Closing active operation
> SET 
> client_identifier=query_test/test_iceberg.py::TestIcebergV2Table::()::test_read_equality_deletes[protocol:beeswax|table_format:parquet/none|exec_option:{'test_replan':1;'disable_optimized_iceberg_v2_read':1;'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'d;
> -- executing against localhost:21000
> use functional_parquet;
> -- 2024-04-03 07:04:53,508 INFO     MainThread: Started query 
> 2c4cf2169ce92fb8:83ac1fe100000000
> SET 
> client_identifier=query_test/test_iceberg.py::TestIcebergV2Table::()::test_read_equality_deletes[protocol:beeswax|table_format:parquet/none|exec_option:{'test_replan':1;'disable_optimized_iceberg_v2_read':1;'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'d;
> SET test_replan=1;
> SET disable_optimized_iceberg_v2_read=1;
> SET batch_size=0;
> SET num_nodes=0;
> SET disable_codegen_rows_threshold=0;
> SET disable_codegen=False;
> SET abort_on_error=1;
> SET exec_single_node_rows_threshold=0;
> -- 2024-04-03 07:04:53,508 INFO     MainThread: Loading query test file: 
> /data/jenkins/workspace/impala-asf-master-exhaustive-data-cache/repos/Impala/testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-equality-deletes.test
> -- executing against localhost:21000
> select * from functional_parquet.iceberg_v2_delete_equality;
> -- 2024-04-03 07:05:00,815 INFO     MainThread: Started query 
> 5f4a1d542dd45959:34b421c700000000
> -- executing against localhost:21000
> select * from functional_parquet.iceberg_v2_delete_equality for 
> system_version as of 5763349507283783091;
> -- 2024-04-03 07:05:01,068 INFO     MainThread: Started query 
> 0b4eb99bbd973898:e04fbb0800000000
> -- executing against localhost:21000
> select * from functional_parquet.iceberg_v2_delete_equality for system_time 
> as of now();
> -- 2024-04-03 07:05:01,222 INFO     MainThread: Started query 
> d1418660c4de19d2:b3d126b500000000
> -- executing against localhost:21000
> select *, ICEBERG__DATA__SEQUENCE__NUMBER from 
> functional_parquet.iceberg_v2_delete_equality_nulls;
> -- 2024-04-03 07:05:06,830 INFO     MainThread: Started query 
> c940cf88ff46041f:36ca8a9b00000000
> -- executing against localhost:21000
> select *, ICEBERG__DATA__SEQUENCE__NUMBER from 
> functional_parquet.iceberg_v2_delete_equality_nulls
> for system_version as of 4346796256488077976;
> -- 2024-04-03 07:05:06,999 INFO     MainThread: Started query 
> 2d40d5be29152295:902d277600000000
> -- executing against localhost:21000
> select * from functional_parquet.iceberg_v2_delete_both_eq_and_pos;
> -- 2024-04-03 07:05:07,058 INFO     MainThread: Started query 
> 63435f4ab3c2fe59:6687337600000000
> -- executing against localhost:21000
> select * from functional_parquet.iceberg_v2_delete_equality_partitioned order 
> by d, s;
> -- 2024-04-03 07:05:07,325 INFO     MainThread: Started query 
> 04433e3771af06b9:1c23d90600000000
> -- executing against localhost:21000
> select * from functional_parquet.iceberg_v2_delete_equality_multi_eq_ids;
> -- 2024-04-03 07:05:12,858 INFO     MainThread: Started query 
> 544e0e67584e7565:fb3dfe1700000000
> -- executing against localhost:21000
> select * from functional_parquet.iceberg_v2_delete_equality_multi_eq_ids
>   for system_version as of 4077234998626563290;
> -- 2024-04-03 07:05:13,372 INFO     MainThread: Started query 
> 2943018f515fc971:8f38a21b00000000
> -- executing against localhost:21000
> select * from functional_parquet.iceberg_v2_delete_equality_multi_eq_ids
>   for system_version as of 8127619959873391049;
> -- 2024-04-03 07:05:13,537 INFO     MainThread: Started query 
> b54ea6c2f88c161c:d7044f7100000000
> -- executing against localhost:21000
> select * from functional_parquet.iceberg_v2_delete_pos_and_multi_eq_ids;
> -- 2024-04-03 07:05:18,873 INFO     MainThread: Started query 
> 434ca4be277c05c3:4e895df600000000
> -- executing against localhost:21000
> select * from functional_parquet.iceberg_v2_delete_pos_and_multi_eq_ids
>   for system_version as of 152862018760071153;
> -- 2024-04-03 07:05:19,231 INFO     MainThread: Started query 
> e94df499c59205ee:1fa5394100000000
> -- executing against localhost:21000
> select * from functional_parquet.iceberg_v2_equality_delete_schema_evolution;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to