[jira] [Commented] (IMPALA-7665) Bringing up stopped statestore causes queries to fail

Bikramjeet Vig (JIRA) Wed, 24 Apr 2019 13:49:32 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-7665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16825496#comment-16825496
 ]


Bikramjeet Vig commented on IMPALA-7665:
----------------------------------------

Statestore restart can cause transient failures for any mechanism that depends 
on the updates. This is because after restart there is a small window where the 
updates from statestore are incomplete. Consider the following topics and how 
an incomplete state would affect impala:

- impala-membership: cause cancellation of queries in flight (what this JIRA 
will fix) and the scheduler would work with a small subset of the cluster or 
even an empty cluster, that will result in reduced query performance and even 
failure of new queries (no backend available to schedule on). The scheduler 
issue can be fixed by always adding the current(self) impalad to the list, but 
that would affect query performance and would still be a problem for 
coordinator-only hosts. 

- impala-request-queue: an incomplete admission control state can result in 
over-admission of queries 

- catalog-update: can result in failure to plan queries resulting in failure 
for new queries. Although this is not a problem if using "local catalog" 
(catalog V2).

If we think that the immediate issue to resolve is the one about avoiding 
disruption to running queries, then for now lets just go ahead with fixing this 
JIRA and I'll open a new one to track other issues.

> Bringing up stopped statestore causes queries to fail
> -----------------------------------------------------
>
>                 Key: IMPALA-7665
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7665
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Distributed Exec
>    Affects Versions: Impala 3.1.0
>            Reporter: Tim Armstrong
>            Assignee: Bikramjeet Vig
>            Priority: Critical
>              Labels: query-lifecycle, statestore
>
> I can reproduce this by running a long-running query then cycling the 
> statestore:
> {noformat}
> tarmstrong@tarmstrong-box:~/Impala/incubator-impala$ impala-shell.sh -q 
> "select distinct * from tpch10_parquet.lineitem"
> Starting Impala Shell without Kerberos authentication
> Connected to localhost:21000
> Server version: impalad version 3.1.0-SNAPSHOT DEBUG (build 
> c486fb9ea4330e1008fa9b7ceaa60492e43ee120)
> Query: select distinct * from tpch10_parquet.lineitem
> Query submitted at: 2018-10-04 17:06:48 (Coordinator: 
> http://tarmstrong-box:25000)
> {noformat}
> If I kill the statestore, the query runs fine, but if I start up the 
> statestore again, it fails.
> {noformat}
> # In one terminal, start up the statestore
> $ 
> /home/tarmstrong/Impala/incubator-impala/be/build/latest/statestore/statestored
>  -log_filename=statestored 
> -log_dir=/home/tarmstrong/Impala/incubator-impala/logs/cluster -v=1 
> -logbufsecs=5 -max_log_files=10
> # The running query then fails
> WARNINGS: Failed due to unreachable impalad(s): tarmstrong-box:22001, 
> tarmstrong-box:22002
> {noformat}
> Note that I've seen different subsets impalads reported as failed, e.g. 
> "Failed due to unreachable impalad(s): tarmstrong-box:22001"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (IMPALA-7665) Bringing up stopped statestore causes queries to fail

Reply via email to