[ 
https://issues.apache.org/jira/browse/IMPALA-6318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Kaszab resolved IMPALA-6318.
----------------------------------
    Fix Version/s: Not Applicable
       Resolution: Fixed

seems not to be an issue anymore.

> Test suite may hang on test_query_cancellation_during_fetch
> -----------------------------------------------------------
>
>                 Key: IMPALA-6318
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6318
>             Project: IMPALA
>          Issue Type: Bug
>    Affects Versions: Impala 2.11.0
>         Environment: I managed to investigate this issue only once so far, it 
> was hanging in some of our Jenkins build jobs.
>            Reporter: Gabor Kaszab
>            Assignee: Gabor Kaszab
>            Priority: Major
>              Labels: hangs, test_issue
>             Fix For: Not Applicable
>
>         Attachments: Screen Shot 2017-12-13 at 8.56.54.png
>
>
> test_query_cancellation_during_fetch steps:
>   1) Runs a query in Impala shell that goes quickly to fetching state, where 
> the fetching would take several minutes.
>   2) While the query is running, the script polls the Impala debug page to 
> wait until the query gets to "FINISHED" state. This state means that the 
> results are ready for fetching. (There is a 15 try threshold for the polling 
> part.)
>   3) Once the query gets to "FINISHED" state a CTRL-C signal is sent to 
> Impala shell to cancel the query.
>   4) Query output is fetched and verified.
> Initial assumption
> =============
> My initial assumption on this issue was that the query somehow was stuck in 
> step 2) while waiting for the desired query state (and the retry threshold 
> wasn't applied somehow) but when I checked the Impala debug page, apparently 
> the query had gone to completed from in-flight with having 2048 rows already 
> fetched (see picture attached). Impala logs also show that the query had been 
> cancelled.
> {code:java}
> I1209 08:29:35.281550 18194 coordinator.cc:99] Exec() 
> query_id=d248bc6079f33f66:1b638a700000000 stmt=with v as (values (1 as x), 
> (2), (3), (4)) select * from v, v v2, v v3, v v4, v v5, v v6, v v7, v v8, v 
> v9, v v10, v v11
> {code}
> {code:java}
> I1209 08:29:35.895359 18196 query-state.cc:384] Instance completed. 
> instance_id=d248bc6079f33f66:1b638a700000000 #in-flight=0 status=CANCELLED: 
> Cancelled
> I1209 08:29:35.895372 18196 query-state.cc:396] Cancel: 
> query_id=d248bc6079f33f66:1b638a700000000
> I1209 08:29:35.895407 18196 query-exec-mgr.cc:149] ReleaseQueryState(): 
> query_id=d248bc6079f33f66:1b638a700000000 refcnt=2
> I1209 08:29:35.908305 18194 query-exec-mgr.cc:149] ReleaseQueryState(): 
> query_id=d248bc6079f33f66:1b638a700000000 refcnt=1
> {code}
> This means that the step 2) and even step 3) had finished properly and the 
> query was cancelled during the fetching phase.
> The interesting part is when I checked the running processes on the host, I 
> observed a running impala-shell.py that is executing the query.
> {code:java}
> jenkins  18187  6223  0 Dec09 ?        00:00:00 
> <path_to_impala>/Impala/shell/impala_shell.py -i localhost:21000 -q with v as 
> (values (1 as x), (2), (3), (4)) select * from v, v v2, v v3, v v4, v v5, v 
> v6, v v7, v v8, v v9, v v10, v v11;
> {code}
> I attached a gdb to the running process but the backtrace didn't give 
> anything meaningful.
> Summary
> ============
>   - The query shows completed on Impala debug page with a few lines had 
> already been fetched (as desired).
>   - Impala logs show that the query had been cancelled (as desired).
>   - An impala_shell.py is still showing up in 'ps -ef' that seems to run the 
> query.
>   - According to 'top' there is no process that pikes in cpu usage.
> Assumption
> ============
> As the debug page shows that the query is completed I assume that the 
> 'waiting for state' and the actual cancellation of the query finished 
> successfully so the execution should hang on step 4) where the results are 
> retrieved from ImpalaShell.
> {code:java}
> 1) p = ImpalaShell(args)
> 2) self.wait_for_query_state(stmt, cancel_at_state)
> 3) os.kill(p.pid(), signal.SIGINT)
> 4) result = p.get_result()
> {code}
> The get_result() contains a shell_process.communicate() call that fetches the 
> stdout and stderr from the underlying process. According to the python docs 
> on this communicate() function it seems that it doesn't work well when the 
> data size is big.
> Taking into account that this query fetches and prints results for more than 
> 30 mins we can consider the stdout of the ImpalaShell large.
> https://docs.python.org/2/library/subprocess.html
> "Note The data read is buffered in memory, so do not use this method if the 
> data size is large or unlimited."
> If this is indeed the root of the issue then the possible solution is to 
> modify the util.py:ImpalaShell to judge based on an input parameter when 
> calling Popen whether it connects to stdout wit Pipe or not connect to it at 
> all. This would be suitable with this test as the stdout is not used at all, 
> only the stderr is asserted on, so no need to get the stdout data as well 
> from the ImpalaShell.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to