[
https://issues.apache.org/jira/browse/IMPALA-13374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Csaba Ringhofer updated IMPALA-13374:
-------------------------------------
Description:
There are several issues with the current way runtime profiles are downloaded
in impala-shell:
https://github.com/apache/impala/blob/874e4fa117bdccfb8784c1987e5e3bf1ef4fbc1d/shell/impala_shell.py#L1463
1. The profile is fetched AFTER the queries are closed, which means that Impala
may have already discarded it from memory, in which case the RPC will return an
error.
(Closing the query happens at different point depending on is_dml, but both
happen before fetching the profile.)
2. If show_profiles=true, then failing to fetch the profiles is treated as an
error. This leads to just an error message in interactive sessions, but with -q
or -f parameter it will stop executing the queries and return with non 0 exit
status.
3. The profile is fetched from Impala even if it is not used at all
(show_profiles=false, which is the default). This is not a functional bug but
can impact performance.
4. The downloaded profile is not cached, so a subsequent PROFILE; command will
download it again. This is not just an optimization issue, but may lead to
script failures if the profile is already discarded when PROFILE; is called.
Note that the "already discarded" case has special handling during SUMMARY (but
not for PROFILE) command, if the query id is not found, then it is not treated
as an error.
https://github.com/apache/impala/blob/874e4fa117bdccfb8784c1987e5e3bf1ef4fbc1d/shell/impala_shell.py#L820
The main problem is the combination of 1 and 2, as it can lead to failures if
show_profiles=true, even when everything works as expected and the coordinator
discards the profile between close and get_runtime_profile.
UPDATE:
since IMPALA-13556 impala-shell no longer tries to download the profile if
show_profiles=false
was:
There are several issues with the current way runtime profiles are downloaded
in impala-shell:
https://github.com/apache/impala/blob/874e4fa117bdccfb8784c1987e5e3bf1ef4fbc1d/shell/impala_shell.py#L1463
1. The profile is fetched AFTER the queries are closed, which means that Impala
may have already discarded it from memory, in which case the RPC will return an
error.
(Closing the query happens at different point depending on is_dml, but both
happen before fetching the profile.)
2. If show_profiles=true, then failing to fetch the profiles is treated as an
error. This leads to just an error message in interactive sessions, but with -q
or -f parameter it will stop executing the queries and return with non 0 exit
status.
3. The profile is fetched from Impala even if it is not used at all
(show_profiles=false, which is the default). This is not a functional bug but
can impact performance.
4. The downloaded profile is not cached, so a subsequent PROFILE; command will
download it again. This is not just an optimization issue, but may lead to
script failures if the profile is already discarded when PROFILE; is called.
Note that the "already discarded" case has special handling during SUMMARY (but
not for PROFILE) command, if the query id is not found, then it is not treated
as an error.
https://github.com/apache/impala/blob/874e4fa117bdccfb8784c1987e5e3bf1ef4fbc1d/shell/impala_shell.py#L820
The main problem is the combination of 1 and 2, as it can lead to failures if
show_profiles=true, even when everything works as expected and the coordinator
discards the profile between close and get_runtime_profile.
> impala-shell can hit errors when downloading runtime profile
> ------------------------------------------------------------
>
> Key: IMPALA-13374
> URL: https://issues.apache.org/jira/browse/IMPALA-13374
> Project: IMPALA
> Issue Type: Bug
> Components: Clients
> Reporter: Csaba Ringhofer
> Priority: Critical
>
> There are several issues with the current way runtime profiles are downloaded
> in impala-shell:
> https://github.com/apache/impala/blob/874e4fa117bdccfb8784c1987e5e3bf1ef4fbc1d/shell/impala_shell.py#L1463
> 1. The profile is fetched AFTER the queries are closed, which means that
> Impala may have already discarded it from memory, in which case the RPC will
> return an error.
> (Closing the query happens at different point depending on is_dml, but both
> happen before fetching the profile.)
> 2. If show_profiles=true, then failing to fetch the profiles is treated as an
> error. This leads to just an error message in interactive sessions, but with
> -q or -f parameter it will stop executing the queries and return with non 0
> exit status.
> 3. The profile is fetched from Impala even if it is not used at all
> (show_profiles=false, which is the default). This is not a functional bug but
> can impact performance.
> 4. The downloaded profile is not cached, so a subsequent PROFILE; command
> will download it again. This is not just an optimization issue, but may lead
> to script failures if the profile is already discarded when PROFILE; is
> called.
> Note that the "already discarded" case has special handling during SUMMARY
> (but not for PROFILE) command, if the query id is not found, then it is not
> treated as an error.
> https://github.com/apache/impala/blob/874e4fa117bdccfb8784c1987e5e3bf1ef4fbc1d/shell/impala_shell.py#L820
> The main problem is the combination of 1 and 2, as it can lead to failures if
> show_profiles=true, even when everything works as expected and the
> coordinator discards the profile between close and get_runtime_profile.
> UPDATE:
> since IMPALA-13556 impala-shell no longer tries to download the profile if
> show_profiles=false
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]