[
https://issues.apache.org/jira/browse/IMPALA-6318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gabor Kaszab resolved IMPALA-6318.
----------------------------------
Fix Version/s: Not Applicable
Resolution: Fixed
seems not to be an issue anymore.
> Test suite may hang on test_query_cancellation_during_fetch
> -----------------------------------------------------------
>
> Key: IMPALA-6318
> URL: https://issues.apache.org/jira/browse/IMPALA-6318
> Project: IMPALA
> Issue Type: Bug
> Affects Versions: Impala 2.11.0
> Environment: I managed to investigate this issue only once so far, it
> was hanging in some of our Jenkins build jobs.
> Reporter: Gabor Kaszab
> Assignee: Gabor Kaszab
> Priority: Major
> Labels: hangs, test_issue
> Fix For: Not Applicable
>
> Attachments: Screen Shot 2017-12-13 at 8.56.54.png
>
>
> test_query_cancellation_during_fetch steps:
> 1) Runs a query in Impala shell that goes quickly to fetching state, where
> the fetching would take several minutes.
> 2) While the query is running, the script polls the Impala debug page to
> wait until the query gets to "FINISHED" state. This state means that the
> results are ready for fetching. (There is a 15 try threshold for the polling
> part.)
> 3) Once the query gets to "FINISHED" state a CTRL-C signal is sent to
> Impala shell to cancel the query.
> 4) Query output is fetched and verified.
> Initial assumption
> =============
> My initial assumption on this issue was that the query somehow was stuck in
> step 2) while waiting for the desired query state (and the retry threshold
> wasn't applied somehow) but when I checked the Impala debug page, apparently
> the query had gone to completed from in-flight with having 2048 rows already
> fetched (see picture attached). Impala logs also show that the query had been
> cancelled.
> {code:java}
> I1209 08:29:35.281550 18194 coordinator.cc:99] Exec()
> query_id=d248bc6079f33f66:1b638a700000000 stmt=with v as (values (1 as x),
> (2), (3), (4)) select * from v, v v2, v v3, v v4, v v5, v v6, v v7, v v8, v
> v9, v v10, v v11
> {code}
> {code:java}
> I1209 08:29:35.895359 18196 query-state.cc:384] Instance completed.
> instance_id=d248bc6079f33f66:1b638a700000000 #in-flight=0 status=CANCELLED:
> Cancelled
> I1209 08:29:35.895372 18196 query-state.cc:396] Cancel:
> query_id=d248bc6079f33f66:1b638a700000000
> I1209 08:29:35.895407 18196 query-exec-mgr.cc:149] ReleaseQueryState():
> query_id=d248bc6079f33f66:1b638a700000000 refcnt=2
> I1209 08:29:35.908305 18194 query-exec-mgr.cc:149] ReleaseQueryState():
> query_id=d248bc6079f33f66:1b638a700000000 refcnt=1
> {code}
> This means that the step 2) and even step 3) had finished properly and the
> query was cancelled during the fetching phase.
> The interesting part is when I checked the running processes on the host, I
> observed a running impala-shell.py that is executing the query.
> {code:java}
> jenkins 18187 6223 0 Dec09 ? 00:00:00
> <path_to_impala>/Impala/shell/impala_shell.py -i localhost:21000 -q with v as
> (values (1 as x), (2), (3), (4)) select * from v, v v2, v v3, v v4, v v5, v
> v6, v v7, v v8, v v9, v v10, v v11;
> {code}
> I attached a gdb to the running process but the backtrace didn't give
> anything meaningful.
> Summary
> ============
> - The query shows completed on Impala debug page with a few lines had
> already been fetched (as desired).
> - Impala logs show that the query had been cancelled (as desired).
> - An impala_shell.py is still showing up in 'ps -ef' that seems to run the
> query.
> - According to 'top' there is no process that pikes in cpu usage.
> Assumption
> ============
> As the debug page shows that the query is completed I assume that the
> 'waiting for state' and the actual cancellation of the query finished
> successfully so the execution should hang on step 4) where the results are
> retrieved from ImpalaShell.
> {code:java}
> 1) p = ImpalaShell(args)
> 2) self.wait_for_query_state(stmt, cancel_at_state)
> 3) os.kill(p.pid(), signal.SIGINT)
> 4) result = p.get_result()
> {code}
> The get_result() contains a shell_process.communicate() call that fetches the
> stdout and stderr from the underlying process. According to the python docs
> on this communicate() function it seems that it doesn't work well when the
> data size is big.
> Taking into account that this query fetches and prints results for more than
> 30 mins we can consider the stdout of the ImpalaShell large.
> https://docs.python.org/2/library/subprocess.html
> "Note The data read is buffered in memory, so do not use this method if the
> data size is large or unlimited."
> If this is indeed the root of the issue then the possible solution is to
> modify the util.py:ImpalaShell to judge based on an input parameter when
> calling Popen whether it connects to stdout wit Pipe or not connect to it at
> all. This would be suitable with this test as the stdout is not used at all,
> only the stderr is asserted on, so no need to get the stdout data as well
> from the ImpalaShell.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)