[
https://issues.apache.org/jira/browse/IMPALA-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17308405#comment-17308405
]
Wenzhe Zhou commented on IMPALA-10607:
--------------------------------------
When tried to read the table after CTAS failed, got following error "Query
aborted:Parquet file
s3a://impala-test-uswest2-1/test-warehouse/test_ctas_exprs_7304e515.db/overflowed_decimal_tbl_1/b74f0ce129189cf1-4c3c5bd600000000_1609291350_data.0.parq
has an invalid file length: 4". The query ended up with a corrupt table on S3
when the CTAS finished with error. It seems the Parquet file is not finalized
on S3 when the query was aborted.
That sounds like a bug. It's low priority since the table isn't expected to
have meaningful contents anyways.
Before the patch for IMPALA-10564 was merged, CTAS with selection from other
source table (for example, create table t11 as select id, cast(a*b*c as decimal
(28,10)) from t10) fails when there is decimal overflow. Verified that we got
same error on S3 when tried to access the table after CTAS failed. So this is
NOT a new issue.
When HdfsParquetTableWriter::AppendRows() return an error,
HdfsTableSink::WriteRowsToPartition return error without calling
HdfsTableSink::FinalizePartitionFile() so that
HdfsParquetTableWriter::Finalize() is not called. This could cause data file
corruption. It's tricky to fix the issue. If HdfsParquetTableWriter::Finalize()
is called, NULL will be wrote to table. But we don't expect to insert into the
table.
> TestDecimalOverflowExprs::test_ctas_exprs failed in S3 build
> ------------------------------------------------------------
>
> Key: IMPALA-10607
> URL: https://issues.apache.org/jira/browse/IMPALA-10607
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 4.0
> Reporter: Wenzhe Zhou
> Assignee: Wenzhe Zhou
> Priority: Major
>
> TestDecimalOverflowExprs::test_ctas_exprs failed in S3 build
> Stack trace:
> Stack trace for S3 build.
> [https://master-03.jenkins.cloudera.com/job/impala-cdpd-master-staging-core-s3/34/]
> query_test.test_decimal_queries.TestDecimalOverflowExprs.test_ctas_exprs[protocol:
> beeswax | exec_option: \\{'batch_size': 0, 'num_nodes': 0,
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False,
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format:
> parquet/none] (from pytest)
> Failing for the past 1 build (Since Failed#34 )
> Took 13 sec.
> Error Message
> ImpalaBeeswaxException: ImpalaBeeswaxException: Query aborted:Parquet file
> s3a://impala-test-uswest2-1/test-warehouse/test_ctas_exprs_7304e515.db/overflowed_decimal_tbl_1/b74f0ce129189cf1-4c3c5bd600000000_1609291350_data.0.parq
> has an invalid file length: 4
> Stacktrace
> query_test/test_decimal_queries.py:170: in test_ctas_exprs
> "SELECT count(*) FROM %s" % TBL_NAME_1)
> /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/tests/common/impala_test_suite.py:814:
> in wrapper
> return function(*args, **kwargs)
> /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/tests/common/impala_test_suite.py:822:
> in execute_query_expect_success
> result = cls.__execute_query(impalad_client, query, query_options, user)
> /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/tests/common/impala_test_suite.py:923:
> in __execute_query
> return impalad_client.execute(query, user=user)
> /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/tests/common/impala_connection.py:205:
> in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/tests/beeswax/impala_beeswax.py:187:
> in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/tests/beeswax/impala_beeswax.py:365:
> in __execute_query
> self.wait_for_finished(handle)
> /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/tests/beeswax/impala_beeswax.py:386:
> in wait_for_finished
> raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
> E ImpalaBeeswaxException: ImpalaBeeswaxException:
> E Query aborted:Parquet file
> s3a://impala-test-uswest2-1/test-warehouse/test_ctas_exprs_7304e515.db/overflowed_decimal_tbl_1/b74f0ce129189cf1-4c3c5bd600000000_1609291350_data.0.parq
> has an invalid file length: 4
> Standard Error
> SET
> client_identifier=query_test/test_decimal_queries.py::TestDecimalOverflowExprs::()::test_ctas_exprs[protocol:beeswax|exec_option:\{'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':False;'abort_on_error':1;'exec_single_node_rows_threshold':0};
> SET sync_ddl=False;
> – executing against localhost:21000
> DROP DATABASE IF EXISTS `test_ctas_exprs_7304e515` CASCADE;
> – 2021-03-24 03:56:00,840 INFO MainThread: Started query
> 574a532f47ac7c80:c1c62ae000000000
> SET
> client_identifier=query_test/test_decimal_queries.py::TestDecimalOverflowExprs::()::test_ctas_exprs[protocol:beeswax|exec_option:\{'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':False;'abort_on_error':1;'exec_single_node_rows_threshold':0};
> SET sync_ddl=False;
> – executing against localhost:21000
> CREATE DATABASE `test_ctas_exprs_7304e515`;
> – 2021-03-24 03:56:03,120 INFO MainThread: Started query
> 424b970f206e271f:ade0b52400000000
> – 2021-03-24 03:56:03,121 INFO MainThread: Created database
> "test_ctas_exprs_7304e515" for test ID
> "query_test/test_decimal_queries.py::TestDecimalOverflowExprs::()::test_ctas_exprs[protocol:
> beeswax | exec_option: \\{'batch_size': 0, 'num_nodes': 0,
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False,
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format:
> parquet/none]"
> – executing against localhost:21000
> SET decimal_v2=true;
> – 2021-03-24 03:56:03,126 INFO MainThread: Started query
> 4545d8b9db5e9342:8b3ba57000000000
> – executing against localhost:21000
> DROP TABLE IF EXISTS `test_ctas_exprs_7304e515`.`overflowed_decimal_tbl_1`;
> – 2021-03-24 03:56:03,131 INFO MainThread: Started query
> 2c4bc9fc85e2b8e8:05e35eed00000000
> SET
> client_identifier=query_test/test_decimal_queries.py::TestDecimalOverflowExprs::()::test_ctas_exprs[protocol:beeswax|exec_option:\{'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':False;'abort_on_error':1;'exec_single_node_rows_threshold':0};
> – executing against localhost:21000
> use functional_parquet;
> – 2021-03-24 03:56:03,135 INFO MainThread: Started query
> 38403231c3885691:b0ba2cc400000000
> SET
> client_identifier=query_test/test_decimal_queries.py::TestDecimalOverflowExprs::()::test_ctas_exprs[protocol:beeswax|exec_option:\{'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':False;'abort_on_error':1;'exec_single_node_rows_threshold':0};
> SET batch_size=0;
> SET num_nodes=0;
> SET disable_codegen_rows_threshold=0;
> SET disable_codegen=False;
> SET abort_on_error=1;
> SET exec_single_node_rows_threshold=0;
> – executing against localhost:21000
> CREATE TABLE `test_ctas_exprs_7304e515`.`overflowed_decimal_tbl_1` STORED AS
> PARQUET AS SELECT 1 as i, cast(a*a*a as decimal (28,10)) as d_28 FROM (SELECT
> cast(654964569154.9565 as decimal (28,7)) as a) q;
> – 2021-03-24 03:56:03,399 INFO MainThread: Started query
> b74f0ce129189cf1:4c3c5bd600000000
> – executing against localhost:21000
> SELECT count(*) FROM `test_ctas_exprs_7304e515`.`overflowed_decimal_tbl_1`;
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]