[ 
https://issues.apache.org/jira/browse/IMPALA-13374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-13374:
-------------------------------------
    Description: 
There are several issues with the current way runtime profiles are downloaded 
in impala-shell: 
https://github.com/apache/impala/blob/874e4fa117bdccfb8784c1987e5e3bf1ef4fbc1d/shell/impala_shell.py#L1463

1. The profile is fetched AFTER the queries are closed, which means that Impala 
may have already discarded it from memory, in which case the RPC will return an 
error. 
(Closing the query happens at different point depending on is_dml, but both 
happen before fetching the profile.)

2. If show_profiles=true, then failing to fetch the profiles is treated as an 
error. This leads to just an error message in interactive sessions, but with -q 
or -f parameter it will stop executing the queries and return with non 0 exit 
status.

3. The profile is fetched from Impala even if it is not used at all 
(show_profiles=false, which is the default). This is not a functional bug but 
can impact performance.

4. The downloaded profile is not cached, so a subsequent PROFILE; command will 
download it again. This is not just an optimization issue, but may lead to 
script failures if the profile is already discarded when PROFILE; is called.

Note that the "already discarded" case has special handling during SUMMARY (but 
not for PROFILE) command, if the query id is not found, then it is not treated 
as an error.
https://github.com/apache/impala/blob/874e4fa117bdccfb8784c1987e5e3bf1ef4fbc1d/shell/impala_shell.py#L820

The main problem is the combination of 1 and 2, as it can lead to failures if 
show_profiles=true, even when everything works as expected and the coordinator 
discards the profile between close and get_runtime_profile.

UPDATE:
since IMPALA-13556 impala-shell no longer tries to download the profile if 
show_profiles=false


  was:
There are several issues with the current way runtime profiles are downloaded 
in impala-shell: 
https://github.com/apache/impala/blob/874e4fa117bdccfb8784c1987e5e3bf1ef4fbc1d/shell/impala_shell.py#L1463

1. The profile is fetched AFTER the queries are closed, which means that Impala 
may have already discarded it from memory, in which case the RPC will return an 
error. 
(Closing the query happens at different point depending on is_dml, but both 
happen before fetching the profile.)

2. If show_profiles=true, then failing to fetch the profiles is treated as an 
error. This leads to just an error message in interactive sessions, but with -q 
or -f parameter it will stop executing the queries and return with non 0 exit 
status.

3. The profile is fetched from Impala even if it is not used at all 
(show_profiles=false, which is the default). This is not a functional bug but 
can impact performance.

4. The downloaded profile is not cached, so a subsequent PROFILE; command will 
download it again. This is not just an optimization issue, but may lead to 
script failures if the profile is already discarded when PROFILE; is called.

Note that the "already discarded" case has special handling during SUMMARY (but 
not for PROFILE) command, if the query id is not found, then it is not treated 
as an error.
https://github.com/apache/impala/blob/874e4fa117bdccfb8784c1987e5e3bf1ef4fbc1d/shell/impala_shell.py#L820

The main problem is the combination of 1 and 2, as it can lead to failures if 
show_profiles=true, even when everything works as expected and the coordinator 
discards the profile between close and get_runtime_profile.




> impala-shell can hit errors when downloading runtime profile
> ------------------------------------------------------------
>
>                 Key: IMPALA-13374
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13374
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Clients
>            Reporter: Csaba Ringhofer
>            Priority: Critical
>
> There are several issues with the current way runtime profiles are downloaded 
> in impala-shell: 
> https://github.com/apache/impala/blob/874e4fa117bdccfb8784c1987e5e3bf1ef4fbc1d/shell/impala_shell.py#L1463
> 1. The profile is fetched AFTER the queries are closed, which means that 
> Impala may have already discarded it from memory, in which case the RPC will 
> return an error. 
> (Closing the query happens at different point depending on is_dml, but both 
> happen before fetching the profile.)
> 2. If show_profiles=true, then failing to fetch the profiles is treated as an 
> error. This leads to just an error message in interactive sessions, but with 
> -q or -f parameter it will stop executing the queries and return with non 0 
> exit status.
> 3. The profile is fetched from Impala even if it is not used at all 
> (show_profiles=false, which is the default). This is not a functional bug but 
> can impact performance.
> 4. The downloaded profile is not cached, so a subsequent PROFILE; command 
> will download it again. This is not just an optimization issue, but may lead 
> to script failures if the profile is already discarded when PROFILE; is 
> called.
> Note that the "already discarded" case has special handling during SUMMARY 
> (but not for PROFILE) command, if the query id is not found, then it is not 
> treated as an error.
> https://github.com/apache/impala/blob/874e4fa117bdccfb8784c1987e5e3bf1ef4fbc1d/shell/impala_shell.py#L820
> The main problem is the combination of 1 and 2, as it can lead to failures if 
> show_profiles=true, even when everything works as expected and the 
> coordinator discards the profile between close and get_runtime_profile.
> UPDATE:
> since IMPALA-13556 impala-shell no longer tries to download the profile if 
> show_profiles=false



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to