[
https://issues.apache.org/jira/browse/IMPALA-11807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17649924#comment-17649924
]
Noemi Pap-Takacs edited comment on IMPALA-11807 at 12/20/22 6:32 PM:
---------------------------------------------------------------------
I suspect that these tests failed because test tables were created in a
different way than in the other tests before.
'TestIcebergTable.test_avro_file_format' was already present in IMPALA-11158,
and it has not failed.
The difference is that in that commit, the test table was generated with a
dependent load using previously uploaded files. However, in IMPALA-11708 the
test was updated so that the table is directly generated by Hive.
These 2 tests ('TestIcebergTable.test_mixed_file_format' and
'TestIcebergTable.test_avro_file_format') are the only ones that use Hive
directly from Impala shell to create the tables. There used to be a workaround
in the previous tests. The process is described inĀ
[https://gerrit.cloudera.org/#/c/18847/13/testdata/data/README] in details, but
basically it contains these steps:
1. The Iceberg table was created and written from Hive shell to hdfs
2. Then the files were downloaded from hdfs to be transformed, because _the
avro metadata files contained absolute hdfs paths_ hardcoded in them.
3. With the paths in the metadata files rewritten, files were uploaded again and
4. The table was created by Impala
5. Files were loaded into the table
It seems to me that we are facing a similar problem that was solved manually in
the previous workaround described above.
Although I wonder why this problem did not occur in GVO tests.
was (Author: noemi):
I suspect that these tests failed because test tables were created in a
different way than in the other tests before.
'TestIcebergTable.test_avro_file_format' was already present in IMPALA-11158,
and it has not failed.
The difference is that in that commit, the test table was generated with a
dependent load using previously uploaded files. However, in IMPALA-11708 the
test was updated so that the table is directly generated by Hive.
These 2 tests ('TestIcebergTable.test_mixed_file_format' and
'TestIcebergTable.test_avro_file_format') are the only ones that use Hive
directly from Impala shell to create the tables. There used to be a workaround
in the previous tests:
1. The Iceberg table was created and written from Hive shell to hdfs
2. Then the files were downloaded from hdfs to be transformed, because _the
avro metadata files contained absolute hdfs paths_ hardcoded in them.
3. With the paths in the metadata files rewritten, files were uploaded again and
4. The table was created by Impala
5. Files were loaded into the table
It seems to me that we are facing a similar problem that was solved manually in
the previous workaround described above.
Although I wonder why this problem did not occur in GVO tests.
> TestIcebergTable.test_avro_file_format and
> TestIcebergTable.test_mixed_file_format failed
> -----------------------------------------------------------------------------------------
>
> Key: IMPALA-11807
> URL: https://issues.apache.org/jira/browse/IMPALA-11807
> Project: IMPALA
> Issue Type: Bug
> Components: Backend, Frontend
> Affects Versions: Impala 4.3.0
> Reporter: Wenzhe Zhou
> Assignee: Noemi Pap-Takacs
> Priority: Major
>
> TestIcebergTable.test_avro_file_format failed after merging patch
> IMPALA-11708 (Add support for mixed Iceberg tables with AVRO file format).
> {code:java}
> *Error Message*
> query_test/test_iceberg.py:906: in test_avro_file_format
> self.run_test_case('QueryTest/iceberg-avro', vector, unique_database)
> common/impala_test_suite.py:712: in run_test_case result = exec_fn(query,
> user=test_section.get('USER', '').strip() or None)
> common/impala_test_suite.py:650: in __exec_in_impala result =
> self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:986: in __execute_query return
> impalad_client.execute(query, user=user) common/impala_connection.py:212: in
> execute return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:189: in execute handle =
> self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:365: in __execute_query handle =
> self.execute_query_async(query_string, user=user)
> beeswax/impala_beeswax.py:359: in execute_query_async handle =
> self.__do_rpc(lambda: self.imp_service.query(query,))
> beeswax/impala_beeswax.py:522: in __do_rpc raise
> ImpalaBeeswaxException(self.__build_error_message(b), b) E
> ImpalaBeeswaxException: ImpalaBeeswaxException: E INNER EXCEPTION: <class
> 'beeswaxd.ttypes.BeeswaxException'> E MESSAGE: AnalysisException: Failed
> to load metadata for table: 'functional_parquet.iceberg_avro_format' E
> CAUSED BY: TableLoadingException: IcebergTableLoadingException: Error loading
> metadata for Iceberg table
> s3a://impala-test-uswest2-2/test-warehouse/functional_parquet.db/iceberg_avro_format
> E CAUSED BY: RuntimeIOException: Failed to open input stream for file:
> hdfs://localhost:20500/test-warehouse/functional_parquet.db/iceberg_avro_format/metadata/snap-5594844384179945437-1-6b11ef63-7b9a-48a5-a448-7cc329eb85ec.avro
> E CAUSED BY: ConnectException: Call From
> impala-ec2-centos79-m6i-4xlarge-ondemand-1b22.vpc.cloudera.com/127.0.0.1 to
> localhost:20500 failed on connection exception: java.net.ConnectException:
> Connection refused; For more details see:
> http://wiki.apache.org/hadoop/ConnectionRefused E CAUSED BY:
> ConnectException: Connection refused
> *Stacktrace*
> query_test/test_iceberg.py:906: in test_avro_file_format
> self.run_test_case('QueryTest/iceberg-avro', vector, unique_database)
> common/impala_test_suite.py:712: in run_test_case
> result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
> common/impala_test_suite.py:650: in __exec_in_impala
> result = self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:986: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:212: in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:189: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:365: in __execute_query
> handle = self.execute_query_async(query_string, user=user)
> beeswax/impala_beeswax.py:359: in execute_query_async
> handle = self.__do_rpc(lambda: self.imp_service.query(query,))
> beeswax/impala_beeswax.py:522: in __do_rpc
> raise ImpalaBeeswaxException(self.__build_error_message(b), b)
> E ImpalaBeeswaxException: ImpalaBeeswaxException:
> E INNER EXCEPTION: <class 'beeswaxd.ttypes.BeeswaxException'>
> E MESSAGE: AnalysisException: Failed to load metadata for table:
> 'functional_parquet.iceberg_avro_format'
> E CAUSED BY: TableLoadingException: IcebergTableLoadingException: Error
> loading metadata for Iceberg table
> s3a://impala-test-uswest2-2/test-warehouse/functional_parquet.db/iceberg_avro_format
> E CAUSED BY: RuntimeIOException: Failed to open input stream for file:
> hdfs://localhost:20500/test-warehouse/functional_parquet.db/iceberg_avro_format/metadata/snap-5594844384179945437-1-6b11ef63-7b9a-48a5-a448-7cc329eb85ec.avro
> E CAUSED BY: ConnectException: Call From
> impala-ec2-centos79-m6i-4xlarge-ondemand-1b22.vpc.cloudera.com/127.0.0.1 to
> localhost:20500 failed on connection exception: java.net.ConnectException:
> Connection refused; For more details see:
> http://wiki.apache.org/hadoop/ConnectionRefused
> E CAUSED BY: ConnectException: Connection refused
> *Standard Error*
> SET
> client_identifier=query_test/test_iceberg.py::TestIcebergTable::()::test_avro_file_format[protocol:beeswax|exec_option:{'test_replan':1;'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':False;'abort_on_error':1;'exec_single_node_rows_thresho;
> SET sync_ddl=False;
> -- executing against localhost:21000
> DROP DATABASE IF EXISTS `test_avro_file_format_857d7a53` CASCADE;
> -- 2022-12-16 22:28:50,516 INFO MainThread: Started query
> e74f5b79e5a239bd:99af979600000000
> SET
> client_identifier=query_test/test_iceberg.py::TestIcebergTable::()::test_avro_file_format[protocol:beeswax|exec_option:{'test_replan':1;'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':False;'abort_on_error':1;'exec_single_node_rows_thresho;
> SET sync_ddl=False;
> -- executing against localhost:21000
> CREATE DATABASE `test_avro_file_format_857d7a53`;
> -- 2022-12-16 22:28:57,091 INFO MainThread: Started query
> 934c3b1446ff501e:f7e9e85900000000
> -- 2022-12-16 22:28:57,394 INFO MainThread: Created database
> "test_avro_file_format_857d7a53" for test ID
> "query_test/test_iceberg.py::TestIcebergTable::()::test_avro_file_format[protocol:
> beeswax | exec_option: {'test_replan': 1, 'batch_size': 0, 'num_nodes': 0,
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False,
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format:
> parquet/none]"
> SET
> client_identifier=query_test/test_iceberg.py::TestIcebergTable::()::test_avro_file_format[protocol:beeswax|exec_option:{'test_replan':1;'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':False;'abort_on_error':1;'exec_single_node_rows_thresho;
> -- executing against localhost:21000
> use test_avro_file_format_857d7a53;
> -- 2022-12-16 22:28:57,396 INFO MainThread: Started query
> b44407d013839e5e:7f22ab5400000000
> SET
> client_identifier=query_test/test_iceberg.py::TestIcebergTable::()::test_avro_file_format[protocol:beeswax|exec_option:{'test_replan':1;'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':False;'abort_on_error':1;'exec_single_node_rows_thresho;
> SET test_replan=1;
> SET batch_size=0;
> SET num_nodes=0;
> SET disable_codegen_rows_threshold=0;
> SET disable_codegen=False;
> SET abort_on_error=1;
> SET exec_single_node_rows_threshold=0;
> -- 2022-12-16 22:28:57,396 INFO MainThread: Loading query test file:
> /data/jenkins/workspace/impala-asf-master-core-s3-data-cache/repos/Impala/testdata/workloads/functional-query/queries/QueryTest/iceberg-avro.test
> -- executing against localhost:21000
> select * from functional_parquet.iceberg_avro_format;
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]