[
https://issues.apache.org/jira/browse/IMPALA-7665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16825496#comment-16825496
]
Bikramjeet Vig commented on IMPALA-7665:
----------------------------------------
Statestore restart can cause transient failures for any mechanism that depends
on the updates. This is because after restart there is a small window where the
updates from statestore are incomplete. Consider the following topics and how
an incomplete state would affect impala:
- impala-membership: cause cancellation of queries in flight (what this JIRA
will fix) and the scheduler would work with a small subset of the cluster or
even an empty cluster, that will result in reduced query performance and even
failure of new queries (no backend available to schedule on). The scheduler
issue can be fixed by always adding the current(self) impalad to the list, but
that would affect query performance and would still be a problem for
coordinator-only hosts.
- impala-request-queue: an incomplete admission control state can result in
over-admission of queries
- catalog-update: can result in failure to plan queries resulting in failure
for new queries. Although this is not a problem if using "local catalog"
(catalog V2).
If we think that the immediate issue to resolve is the one about avoiding
disruption to running queries, then for now lets just go ahead with fixing this
JIRA and I'll open a new one to track other issues.
> Bringing up stopped statestore causes queries to fail
> -----------------------------------------------------
>
> Key: IMPALA-7665
> URL: https://issues.apache.org/jira/browse/IMPALA-7665
> Project: IMPALA
> Issue Type: Bug
> Components: Distributed Exec
> Affects Versions: Impala 3.1.0
> Reporter: Tim Armstrong
> Assignee: Bikramjeet Vig
> Priority: Critical
> Labels: query-lifecycle, statestore
>
> I can reproduce this by running a long-running query then cycling the
> statestore:
> {noformat}
> tarmstrong@tarmstrong-box:~/Impala/incubator-impala$ impala-shell.sh -q
> "select distinct * from tpch10_parquet.lineitem"
> Starting Impala Shell without Kerberos authentication
> Connected to localhost:21000
> Server version: impalad version 3.1.0-SNAPSHOT DEBUG (build
> c486fb9ea4330e1008fa9b7ceaa60492e43ee120)
> Query: select distinct * from tpch10_parquet.lineitem
> Query submitted at: 2018-10-04 17:06:48 (Coordinator:
> http://tarmstrong-box:25000)
> {noformat}
> If I kill the statestore, the query runs fine, but if I start up the
> statestore again, it fails.
> {noformat}
> # In one terminal, start up the statestore
> $
> /home/tarmstrong/Impala/incubator-impala/be/build/latest/statestore/statestored
> -log_filename=statestored
> -log_dir=/home/tarmstrong/Impala/incubator-impala/logs/cluster -v=1
> -logbufsecs=5 -max_log_files=10
> # The running query then fails
> WARNINGS: Failed due to unreachable impalad(s): tarmstrong-box:22001,
> tarmstrong-box:22002
> {noformat}
> Note that I've seen different subsets impalads reported as failed, e.g.
> "Failed due to unreachable impalad(s): tarmstrong-box:22001"
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]