Csaba Ringhofer created IMPALA-13374:
----------------------------------------
Summary: impala-shell can hit errors when downloading runtime
profile
Key: IMPALA-13374
URL: https://issues.apache.org/jira/browse/IMPALA-13374
Project: IMPALA
Issue Type: Bug
Components: Clients
Reporter: Csaba Ringhofer
There are several issues with the current way runtime profiles are downloaded
in impala-shell:
https://github.infra.cloudera.com/CDH/Impala/blob/2010c93bd364795d4ee7d17ea8805450658fc485/shell/impala_shell.py#L1196
1. The profile is fetched AFTER the queries are closed, which means that Impala
may have already discarded it from memory, in which case the RPC will return an
error.
(Closing the query happens at different point depending on is_dml, but both
happen before fetching the profile.)
2. If show_profiles=true, then failing to fetch the profiles is treated as an
error. This leads to just an error message in interactive sessions, but with -q
or -f parameter it will stop executing the queries and return with non 0 exit
status.
3. The profile is fetched from Impala even if it is not used at all
(show_profiles=false, which is the default). This is not a functional bug but
can impact performance.
4. The downloaded profile is not cached, so a subsequent PROFILE; command will
download it again. This is not just an optimization issue, but may lead to
script failures if the profile is already discarded when PROFILE; is called.
Note that the "already discarded" case has special handling during SUMMARY (but
not for PROFILE) command, if the query id is not found, then it is not treated
as an error.
https://github.infra.cloudera.com/CDH/Impala/blob/2010c93bd364795d4ee7d17ea8805450658fc485/shell/impala_shell.py#L684
The main problem is the combination of 1 and 2, as it can lead to failures if
show_profiles=true, even when everything works as expected and the coordinator
discards the profile between close and get_runtime_profile.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)