[ 
https://issues.apache.org/jira/browse/IMPALA-7665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766402#comment-16766402
 ] 

Tim Armstrong commented on IMPALA-7665:
---------------------------------------


* Should we add a mechanism to allow detecting that it's a new statestore? E.g. 
the statestore generates a unique ID each time it starts up.
* Is there a reason that WAIT_TIME needs to match up with the time taken for 
the statestore to detect failure? 30s seems like a safe amount of time to 
expect a healthy impalad to re-register in but would it need to be increased if 
we increased the heartbeat interval?

The scenario where a statestore and impalad go down and up at the same time 
should be rare, and is related to an existing problem with fast crash/restarts 
- IMPALA-414. That's unless we're doing a non-graceful rolling restart of the 
statestore and impalads at the same time. Maybe that's an argument against 
restarting those services at the same time.

> Bringing up stopped statestore causes queries to fail
> -----------------------------------------------------
>
>                 Key: IMPALA-7665
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7665
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Distributed Exec
>    Affects Versions: Impala 3.1.0
>            Reporter: Tim Armstrong
>            Priority: Critical
>              Labels: query-lifecycle, statestore
>
> I can reproduce this by running a long-running query then cycling the 
> statestore:
> {noformat}
> tarmstrong@tarmstrong-box:~/Impala/incubator-impala$ impala-shell.sh -q 
> "select distinct * from tpch10_parquet.lineitem"
> Starting Impala Shell without Kerberos authentication
> Connected to localhost:21000
> Server version: impalad version 3.1.0-SNAPSHOT DEBUG (build 
> c486fb9ea4330e1008fa9b7ceaa60492e43ee120)
> Query: select distinct * from tpch10_parquet.lineitem
> Query submitted at: 2018-10-04 17:06:48 (Coordinator: 
> http://tarmstrong-box:25000)
> {noformat}
> If I kill the statestore, the query runs fine, but if I start up the 
> statestore again, it fails.
> {noformat}
> # In one terminal, start up the statestore
> $ 
> /home/tarmstrong/Impala/incubator-impala/be/build/latest/statestore/statestored
>  -log_filename=statestored 
> -log_dir=/home/tarmstrong/Impala/incubator-impala/logs/cluster -v=1 
> -logbufsecs=5 -max_log_files=10
> # The running query then fails
> WARNINGS: Failed due to unreachable impalad(s): tarmstrong-box:22001, 
> tarmstrong-box:22002
> {noformat}
> Note that I've seen different subsets impalads reported as failed, e.g. 
> "Failed due to unreachable impalad(s): tarmstrong-box:22001"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to