[ 
https://issues.apache.org/jira/browse/IMPALA-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17312869#comment-17312869
 ] 

ASF subversion and git services commented on IMPALA-10607:
----------------------------------------------------------

Commit 4ef67c21153157d552a6d5db09f4c1e15cbe8ac0 in impala's branch 
refs/heads/master from wzhou-code
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=4ef67c2 ]

IMPALA-10607: Fixed test_ctas_exprs failure for S3 build

New test case TestDecimalOverflowExprs::test_ctas_exprs was added
in the patch for IMPALA-10564. But it failed in S3 build with
Parquet format and complained the Parquet file had an invalid file
length when accessing a table. The table was created by CTAS which
finished with error "decimal expression overflowed". Verified this
issue does not happen if query option s3_skip_insert_staging is set
as false.
When s3_skip_insert_staging is set true by default, INSERT writing
to S3 goes directly to their final location rather than being
copied there by the coordinator. If CTAS finishs with error during
INSERT, the parquet partition file is left in un-finalized without
file footer.  This causes subsequent query failed with error like
"have an invalid file length on S3" when the query attemps to
access the same table.

This patch fixed the issue by deleting the un-finalized file in
its final location when AppendRows() return error and staging has
been skipped.

Testing:
 - Reproduced the test failure in local box with defaultFS as s3.
   Verified the fixing by running test_ctas_exprs with defaultFS
   as s3.
 - Passed core tests.

Change-Id: Ic2f64ab987aeada2cda41502e8c5dbbc229daefd
Reviewed-on: http://gerrit.cloudera.org:8080/17234
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> TestDecimalOverflowExprs::test_ctas_exprs failed in S3 build
> ------------------------------------------------------------
>
>                 Key: IMPALA-10607
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10607
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 4.0
>            Reporter: Wenzhe Zhou
>            Assignee: Wenzhe Zhou
>            Priority: Major
>             Fix For: Impala 4.0
>
>
> TestDecimalOverflowExprs::test_ctas_exprs failed in S3 build
> Stack trace:
> Stack trace for S3 build. 
> [https://master-03.jenkins.cloudera.com/job/impala-cdpd-master-staging-core-s3/34/]
> query_test.test_decimal_queries.TestDecimalOverflowExprs.test_ctas_exprs[protocol:
>  beeswax | exec_option: \\{'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none] (from pytest)
> Failing for the past 1 build (Since Failed#34 )
> Took 13 sec.
> Error Message
> ImpalaBeeswaxException: ImpalaBeeswaxException: Query aborted:Parquet file 
> s3a://impala-test-uswest2-1/test-warehouse/test_ctas_exprs_7304e515.db/overflowed_decimal_tbl_1/b74f0ce129189cf1-4c3c5bd600000000_1609291350_data.0.parq
>  has an invalid file length: 4
> Stacktrace
> query_test/test_decimal_queries.py:170: in test_ctas_exprs
> "SELECT count(*) FROM %s" % TBL_NAME_1)
> /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/tests/common/impala_test_suite.py:814:
>  in wrapper
> return function(*args, **kwargs)
> /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/tests/common/impala_test_suite.py:822:
>  in execute_query_expect_success
> result = cls.__execute_query(impalad_client, query, query_options, user)
> /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/tests/common/impala_test_suite.py:923:
>  in __execute_query
> return impalad_client.execute(query, user=user)
> /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/tests/common/impala_connection.py:205:
>  in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/tests/beeswax/impala_beeswax.py:187:
>  in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/tests/beeswax/impala_beeswax.py:365:
>  in __execute_query
> self.wait_for_finished(handle)
> /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/tests/beeswax/impala_beeswax.py:386:
>  in wait_for_finished
> raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
> E ImpalaBeeswaxException: ImpalaBeeswaxException:
> E Query aborted:Parquet file 
> s3a://impala-test-uswest2-1/test-warehouse/test_ctas_exprs_7304e515.db/overflowed_decimal_tbl_1/b74f0ce129189cf1-4c3c5bd600000000_1609291350_data.0.parq
>  has an invalid file length: 4
> Standard Error
> SET 
> client_identifier=query_test/test_decimal_queries.py::TestDecimalOverflowExprs::()::test_ctas_exprs[protocol:beeswax|exec_option:\{'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':False;'abort_on_error':1;'exec_single_node_rows_threshold':0};
> SET sync_ddl=False;
> – executing against localhost:21000
> DROP DATABASE IF EXISTS `test_ctas_exprs_7304e515` CASCADE;
> – 2021-03-24 03:56:00,840 INFO MainThread: Started query 
> 574a532f47ac7c80:c1c62ae000000000
> SET 
> client_identifier=query_test/test_decimal_queries.py::TestDecimalOverflowExprs::()::test_ctas_exprs[protocol:beeswax|exec_option:\{'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':False;'abort_on_error':1;'exec_single_node_rows_threshold':0};
> SET sync_ddl=False;
> – executing against localhost:21000
> CREATE DATABASE `test_ctas_exprs_7304e515`;
> – 2021-03-24 03:56:03,120 INFO MainThread: Started query 
> 424b970f206e271f:ade0b52400000000
> – 2021-03-24 03:56:03,121 INFO MainThread: Created database 
> "test_ctas_exprs_7304e515" for test ID 
> "query_test/test_decimal_queries.py::TestDecimalOverflowExprs::()::test_ctas_exprs[protocol:
>  beeswax | exec_option: \\{'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none]"
> – executing against localhost:21000
> SET decimal_v2=true;
> – 2021-03-24 03:56:03,126 INFO MainThread: Started query 
> 4545d8b9db5e9342:8b3ba57000000000
> – executing against localhost:21000
> DROP TABLE IF EXISTS `test_ctas_exprs_7304e515`.`overflowed_decimal_tbl_1`;
> – 2021-03-24 03:56:03,131 INFO MainThread: Started query 
> 2c4bc9fc85e2b8e8:05e35eed00000000
> SET 
> client_identifier=query_test/test_decimal_queries.py::TestDecimalOverflowExprs::()::test_ctas_exprs[protocol:beeswax|exec_option:\{'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':False;'abort_on_error':1;'exec_single_node_rows_threshold':0};
> – executing against localhost:21000
> use functional_parquet;
> – 2021-03-24 03:56:03,135 INFO MainThread: Started query 
> 38403231c3885691:b0ba2cc400000000
> SET 
> client_identifier=query_test/test_decimal_queries.py::TestDecimalOverflowExprs::()::test_ctas_exprs[protocol:beeswax|exec_option:\{'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':False;'abort_on_error':1;'exec_single_node_rows_threshold':0};
> SET batch_size=0;
> SET num_nodes=0;
> SET disable_codegen_rows_threshold=0;
> SET disable_codegen=False;
> SET abort_on_error=1;
> SET exec_single_node_rows_threshold=0;
> – executing against localhost:21000
> CREATE TABLE `test_ctas_exprs_7304e515`.`overflowed_decimal_tbl_1` STORED AS 
> PARQUET AS SELECT 1 as i, cast(a*a*a as decimal (28,10)) as d_28 FROM (SELECT 
> cast(654964569154.9565 as decimal (28,7)) as a) q;
> – 2021-03-24 03:56:03,399 INFO MainThread: Started query 
> b74f0ce129189cf1:4c3c5bd600000000
> – executing against localhost:21000
> SELECT count(*) FROM `test_ctas_exprs_7304e515`.`overflowed_decimal_tbl_1`;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to