[ 
https://issues.apache.org/jira/browse/IMPALA-10336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253288#comment-17253288
 ] 

ASF subversion and git services commented on IMPALA-10336:
----------------------------------------------------------

Commit 6b292bdd1527ec5501685c4564145d2a725195d9 in impala's branch 
refs/heads/master from wzhou-code
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=6b292bd ]

IMPALA-10336: Coordinator return incorrect error to client

Due to race condition, coordinator could set execution status as RPC
aborted due to cancellation. This internal error should not be
returned to client.
This patch fixed the issue by setting the backend status as CANCELLED
instead of ABORTED if the exec RPC was aborted due to cancellation.

Testing:
 - Manual tests
   Since this is a racy bug, I could only reproduce the situation by
   adding some artificial delays in 3 places: QueryExecMgr.StartQuery(),
   Coordinator.UpdateBackendExecStatus(), and
   Coordinator::StartBackendExec() when running test case
   test_scanners.py::TestOrc::test_type_conversions_hive3.
   Verified that the issue did not happen after applying this patch
   by running test_scanners.py::TestOrc::test_type_conversions_hive3
   in a loop for hours.
 - Passed exhausive test.

Change-Id: I75f252e43006c6ff6980800e3254672de396b318
Reviewed-on: http://gerrit.cloudera.org:8080/16849
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> test_type_conversions_hive3 fails because incorrect error is returned to 
> client
> -------------------------------------------------------------------------------
>
>                 Key: IMPALA-10336
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10336
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Distributed Exec
>            Reporter: Tim Armstrong
>            Assignee: Wenzhe Zhou
>            Priority: Blocker
>              Labels: broken-build
>             Fix For: Impala 4.0
>
>         Attachments: coordinator_logs.txt, 
> impalad.impala-ec2-centos74-m5-4xlarge-ondemand-1742.vpc.cloudera.com.jenkins.log.INFO.20201113-170705.10502.gz,
>  
> impalad.impala-ec2-centos74-m5-4xlarge-ondemand-1742.vpc.cloudera.com.jenkins.log.INFO.20201113-170705.10506.gz,
>  
> impalad.impala-ec2-centos74-m5-4xlarge-ondemand-1742.vpc.cloudera.com.jenkins.log.INFO.20201113-170705.10512.gz
>
>
> {noformat}
> Regression
> query_test.test_scanners.TestOrc.test_type_conversions_hive3[protocol: 
> beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> orc/def/block] (from pytest)
> Failing for the past 1 build (Since Failed#102 )
> Took 1 min 43 sec.
> add description
> Error Message
> query_test/test_scanners.py:1538: in test_type_conversions_hive3     
> self.run_test_case('DataErrorsTest/orc-type-checks', vector, unique_database) 
> common/impala_test_suite.py:668: in run_test_case     
> self.__verify_exceptions(test_section['CATCH'], str(e), use_db) 
> common/impala_test_suite.py:485: in __verify_exceptions     (expected_str, 
> actual_str) E   AssertionError: Unexpected exception string. Expected: Type 
> mismatch: table column TINYINT is map to column smallint in ORC file E   Not 
> found in actual: ImpalaBeeswaxException: Query aborted:ExecQueryFInstances 
> rpc query_id=8f461cf08845e57c:32ec8ff300000000 failed: Exec() rpc failed: 
> Aborted: ExecQueryFInstances RPC to 127.0.0.1:27002 is cancelled in state SENT
> Stacktrace
> query_test/test_scanners.py:1538: in test_type_conversions_hive3
>     self.run_test_case('DataErrorsTest/orc-type-checks', vector, 
> unique_database)
> common/impala_test_suite.py:668: in run_test_case
>     self.__verify_exceptions(test_section['CATCH'], str(e), use_db)
> common/impala_test_suite.py:485: in __verify_exceptions
>     (expected_str, actual_str)
> E   AssertionError: Unexpected exception string. Expected: Type mismatch: 
> table column TINYINT is map to column smallint in ORC file
> E   Not found in actual: ImpalaBeeswaxException: Query 
> aborted:ExecQueryFInstances rpc query_id=8f461cf08845e57c:32ec8ff300000000 
> failed: Exec() rpc failed: Aborted: ExecQueryFInstances RPC to 
> 127.0.0.1:27002 is cancelled in state SENT
> ....
> -- 2020-11-13 23:05:01,187 INFO     MainThread: Started query 
> 62432dd724112633:bd27d62200000000
> -- executing against localhost:21000
> select c4 from illtypes;
> -- 2020-11-13 23:05:01,243 INFO     MainThread: Started query 
> 8f461cf08845e57c:32ec8ff300000000
> {noformat}
> The problem seems to be that the error that occurs during Prepare() phase is 
> not propagated correctly, and a different internal error wins a race to 
> become the status of the query.
> Attached executor logs and the coordinator logs for the query id. The full 
> coordinator log was too big to attach so I put it here: 
> https://drive.google.com/file/d/1hZVi1AmRJ4Lz7yIeJqVpiXeA_MZTP1it/view?usp=sharing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to