[
https://issues.apache.org/jira/browse/IMPALA-12970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17835884#comment-17835884
]
ASF subversion and git services commented on IMPALA-12970:
----------------------------------------------------------
Commit df7aac9517bb3777f15e583100a087e4d3525ece in impala's branch
refs/heads/master from Gabor Kaszab
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=df7aac951 ]
IMPALA-12970: Fix ConcurrentModificationException for Iceberg table scans
When a table is partitioned IcebergScanNode sorts the file descriptors
for better scheduling. However, the list of file descriptors comes from
IcebergContentFileStore and is shared between different select queries
on the table. When another query tries to iterate the list of file
descriptors and at the same time the IcebergScanNode sorts them we get
a ConcurrentModificationException.
To solve this IceberScanNode now creates its own copy of the file
descriptor list not to interfere with other queries.
Manual testing:
300-400 SELECT * Iceberg queries were sent into Impala in a loop that
confidently reproduced the original issue. With the fix the issue is
gone.
The queries used for the repro:
1:
select *
from functional_parquet.iceberg_v2_partitioned_position_deletes_orc a,
functional_parquet.iceberg_partitioned_orc_external b
where a.action = b.action and b.id=3;
2:
select *
from functional_parquet.iceberg_v2_equality_delete_schema_evolution;
Change-Id: Iafe57f05ffa0fa6a0875c141cfafd5ee1607a5c3
Reviewed-on: http://gerrit.cloudera.org:8080/21267
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Test failure at test_read_equality_deletes in test_iceberg in exhaustive build
> ------------------------------------------------------------------------------
>
> Key: IMPALA-12970
> URL: https://issues.apache.org/jira/browse/IMPALA-12970
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Reporter: Yida Wu
> Assignee: Gabor Kaszab
> Priority: Major
> Labels: broken-build
>
> An error is observed in the data-cache exhaustive build in
> test_read_equality_deletes with following message:
> {code:java}
> query_test.test_iceberg.TestIcebergV2Table.test_read_equality_deletes[protocol:
> beeswax | table_format: parquet/none | exec_option: {'test_replan': 1,
> 'disable_optimized_iceberg_v2_read': 1, 'batch_size': 0, 'num_nodes': 0,
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False,
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0}] (from pytest)
> {code}
> *Error Message*
> {code:java}
> query_test/test_iceberg.py:1456: in test_read_equality_deletes
> self.run_test_case('QueryTest/iceberg-v2-read-equality-deletes', vector)
> common/impala_test_suite.py:725: in run_test_case result = exec_fn(query,
> user=test_section.get('USER', '').strip() or None)
> common/impala_test_suite.py:660: in __exec_in_impala result =
> self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:1013: in __execute_query return
> impalad_client.execute(query, user=user) common/impala_connection.py:215: in
> execute fetch_profile_after_close=fetch_profile_after_close)
> beeswax/impala_beeswax.py:191: in execute handle =
> self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:382: in __execute_query handle =
> self.execute_query_async(query_string, user=user)
> beeswax/impala_beeswax.py:376: in execute_query_async handle =
> self.__do_rpc(lambda: self.imp_service.query(query,))
> beeswax/impala_beeswax.py:539: in __do_rpc raise
> ImpalaBeeswaxException(self.__build_error_message(b), b) E
> ImpalaBeeswaxException: ImpalaBeeswaxException: E INNER EXCEPTION: <class
> 'beeswaxd.ttypes.BeeswaxException'> E MESSAGE:
> ConcurrentModificationException: null
> {code}
> *Stacktrace*
> {code:java}
> query_test/test_iceberg.py:1456: in test_read_equality_deletes
> self.run_test_case('QueryTest/iceberg-v2-read-equality-deletes', vector)
> common/impala_test_suite.py:725: in run_test_case
> result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
> common/impala_test_suite.py:660: in __exec_in_impala
> result = self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:1013: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:215: in execute
> fetch_profile_after_close=fetch_profile_after_close)
> beeswax/impala_beeswax.py:191: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:382: in __execute_query
> handle = self.execute_query_async(query_string, user=user)
> beeswax/impala_beeswax.py:376: in execute_query_async
> handle = self.__do_rpc(lambda: self.imp_service.query(query,))
> beeswax/impala_beeswax.py:539: in __do_rpc
> raise ImpalaBeeswaxException(self.__build_error_message(b), b)
> E ImpalaBeeswaxException: ImpalaBeeswaxException:
> E INNER EXCEPTION: <class 'beeswaxd.ttypes.BeeswaxException'>
> E MESSAGE: ConcurrentModificationException: null
> {code}
> *Standard Error*
> {code:java}
> SET
> client_identifier=query_test/test_iceberg.py::TestIcebergV2Table::()::test_read_equality_deletes[protocol:beeswax|table_format:parquet/none|exec_option:{'test_replan':1;'disable_optimized_iceberg_v2_read':1;'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'d;
> -- connecting to: localhost:21000
> -- 2024-04-03 07:04:53,469 INFO MainThread: Could not connect to ('::1',
> 21000, 0, 0)
> Traceback (most recent call last):
> File
> "/data/jenkins/workspace/impala-asf-master-exhaustive-data-cache/repos/Impala/infra/python/env-gcc10.4.0/lib/python2.7/site-packages/thrift/transport/TSocket.py",
> line 137, in open
> handle.connect(sockaddr)
> File
> "/data/jenkins/workspace/impala-asf-master-exhaustive-data-cache/Impala-Toolchain/toolchain-packages-gcc10.4.0/python-2.7.16/lib/python2.7/socket.py",
> line 228, in meth
> return getattr(self._sock,name)(*args)
> error: [Errno 111] Connection refused
> -- connecting to localhost:21050 with impyla
> -- 2024-04-03 07:04:53,469 INFO MainThread: Could not connect to ('::1',
> 21050, 0, 0)
> Traceback (most recent call last):
> File
> "/data/jenkins/workspace/impala-asf-master-exhaustive-data-cache/repos/Impala/infra/python/env-gcc10.4.0/lib/python2.7/site-packages/thrift/transport/TSocket.py",
> line 137, in open
> handle.connect(sockaddr)
> File
> "/data/jenkins/workspace/impala-asf-master-exhaustive-data-cache/Impala-Toolchain/toolchain-packages-gcc10.4.0/python-2.7.16/lib/python2.7/socket.py",
> line 228, in meth
> return getattr(self._sock,name)(*args)
> error: [Errno 111] Connection refused
> -- 2024-04-03 07:04:53,481 INFO MainThread: Closing active operation
> -- connecting to localhost:28000 with impyla
> -- 2024-04-03 07:04:53,497 INFO MainThread: Closing active operation
> SET
> client_identifier=query_test/test_iceberg.py::TestIcebergV2Table::()::test_read_equality_deletes[protocol:beeswax|table_format:parquet/none|exec_option:{'test_replan':1;'disable_optimized_iceberg_v2_read':1;'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'d;
> -- executing against localhost:21000
> use functional_parquet;
> -- 2024-04-03 07:04:53,508 INFO MainThread: Started query
> 2c4cf2169ce92fb8:83ac1fe100000000
> SET
> client_identifier=query_test/test_iceberg.py::TestIcebergV2Table::()::test_read_equality_deletes[protocol:beeswax|table_format:parquet/none|exec_option:{'test_replan':1;'disable_optimized_iceberg_v2_read':1;'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'d;
> SET test_replan=1;
> SET disable_optimized_iceberg_v2_read=1;
> SET batch_size=0;
> SET num_nodes=0;
> SET disable_codegen_rows_threshold=0;
> SET disable_codegen=False;
> SET abort_on_error=1;
> SET exec_single_node_rows_threshold=0;
> -- 2024-04-03 07:04:53,508 INFO MainThread: Loading query test file:
> /data/jenkins/workspace/impala-asf-master-exhaustive-data-cache/repos/Impala/testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-equality-deletes.test
> -- executing against localhost:21000
> select * from functional_parquet.iceberg_v2_delete_equality;
> -- 2024-04-03 07:05:00,815 INFO MainThread: Started query
> 5f4a1d542dd45959:34b421c700000000
> -- executing against localhost:21000
> select * from functional_parquet.iceberg_v2_delete_equality for
> system_version as of 5763349507283783091;
> -- 2024-04-03 07:05:01,068 INFO MainThread: Started query
> 0b4eb99bbd973898:e04fbb0800000000
> -- executing against localhost:21000
> select * from functional_parquet.iceberg_v2_delete_equality for system_time
> as of now();
> -- 2024-04-03 07:05:01,222 INFO MainThread: Started query
> d1418660c4de19d2:b3d126b500000000
> -- executing against localhost:21000
> select *, ICEBERG__DATA__SEQUENCE__NUMBER from
> functional_parquet.iceberg_v2_delete_equality_nulls;
> -- 2024-04-03 07:05:06,830 INFO MainThread: Started query
> c940cf88ff46041f:36ca8a9b00000000
> -- executing against localhost:21000
> select *, ICEBERG__DATA__SEQUENCE__NUMBER from
> functional_parquet.iceberg_v2_delete_equality_nulls
> for system_version as of 4346796256488077976;
> -- 2024-04-03 07:05:06,999 INFO MainThread: Started query
> 2d40d5be29152295:902d277600000000
> -- executing against localhost:21000
> select * from functional_parquet.iceberg_v2_delete_both_eq_and_pos;
> -- 2024-04-03 07:05:07,058 INFO MainThread: Started query
> 63435f4ab3c2fe59:6687337600000000
> -- executing against localhost:21000
> select * from functional_parquet.iceberg_v2_delete_equality_partitioned order
> by d, s;
> -- 2024-04-03 07:05:07,325 INFO MainThread: Started query
> 04433e3771af06b9:1c23d90600000000
> -- executing against localhost:21000
> select * from functional_parquet.iceberg_v2_delete_equality_multi_eq_ids;
> -- 2024-04-03 07:05:12,858 INFO MainThread: Started query
> 544e0e67584e7565:fb3dfe1700000000
> -- executing against localhost:21000
> select * from functional_parquet.iceberg_v2_delete_equality_multi_eq_ids
> for system_version as of 4077234998626563290;
> -- 2024-04-03 07:05:13,372 INFO MainThread: Started query
> 2943018f515fc971:8f38a21b00000000
> -- executing against localhost:21000
> select * from functional_parquet.iceberg_v2_delete_equality_multi_eq_ids
> for system_version as of 8127619959873391049;
> -- 2024-04-03 07:05:13,537 INFO MainThread: Started query
> b54ea6c2f88c161c:d7044f7100000000
> -- executing against localhost:21000
> select * from functional_parquet.iceberg_v2_delete_pos_and_multi_eq_ids;
> -- 2024-04-03 07:05:18,873 INFO MainThread: Started query
> 434ca4be277c05c3:4e895df600000000
> -- executing against localhost:21000
> select * from functional_parquet.iceberg_v2_delete_pos_and_multi_eq_ids
> for system_version as of 152862018760071153;
> -- 2024-04-03 07:05:19,231 INFO MainThread: Started query
> e94df499c59205ee:1fa5394100000000
> -- executing against localhost:21000
> select * from functional_parquet.iceberg_v2_equality_delete_schema_evolution;
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]