[jira] [Commented] (IMPALA-9788) Weird things happen when impalad restarts with different hostname but same IP

Thomas Tauber-Marshall (Jira) Mon, 01 Jun 2020 10:31:22 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-9788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121191#comment-17121191
 ]


Thomas Tauber-Marshall commented on IMPALA-9788:
------------------------------------------------

I'd have to look into it more, but its a bit more complicated than that. 
There's two values that matter: statestore subscriber id, which is still based 
on hostname:port pairs, and the keys in the cluster membership topic, which is 
what is now using unique ids.

The motivation for continuing to have the subscriber ids be based on 
hostname:port is that if an impalad goes down and a new one is launched in its 
place, they'll have matching subscriber ids so when the new one registers the 
statestore should recognize the conflict and automatically unsubscribe the old 
one before subscribing the new one.

> Weird things happen when impalad restarts with different hostname but same IP
> -----------------------------------------------------------------------------
>
>                 Key: IMPALA-9788
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9788
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 3.4.0
>            Reporter: Tim Armstrong
>            Assignee: Sahil Takiar
>            Priority: Critical
>         Attachments: Screenshot from 2020-05-28 10-53-16.png, 
> get-root-sink-resolved.txt, statestore.log
>
>
> I was messing around with running impala in a single-node dockerized 
> configuration and ran into a bunch of weirdness stemming when I restarted the 
> impalad. It got into a state where where was a new and old statestore 
> registration with the same IP/port and different hostnames (since docker 
> generates new hostnames for each incarnation of the container).
> I saw a crash in Coordinator::GetRootSink(). The cause of that is the 
> coordinator treating the same impalad as two distinct backends, and sending 
> two execute RPCs to the backend (this is a single node cluster).
> {noformat}
> I0528 17:32:41.760128   573 coordinator.cc:143] 
> f84b158b036445ad:3a9defdf00000000] Exec() 
> query_id=f84b158b036445ad:3a9defdf00000000 stmt=SELECT COUNT(*) FROM 
> tpcds_kudu.call_center
> I0528 17:32:41.760670   573 coordinator.cc:463] 
> f84b158b036445ad:3a9defdf00000000] starting execution on 2 backends for 
> query_id=f84b158b036445ad:3a9defdf00000000
> ..
> I0528 17:32:41.762449    78 control-service.cc:153] 
> f84b158b036445ad:3a9defdf00000000] ExecQueryFInstances(): 
> query_id=f84b158b036445ad:3a9defdf00000000 coord=a16ac03fc53b:22000 
> #instances=1
> I0528 17:32:41.761706    79 control-service.cc:153] 
> f84b158b036445ad:3a9defdf00000000] ExecQueryFInstances(): 
> query_id=f84b158b036445ad:3a9defdf00000000 coord=a16ac03fc53b:22000 
> #instances=4
> ..
> Wrote minidump to 
> /opt/impala/logs/minidumps/impalad/15727084-c931-49e1-62d37e86-75cfe0f6.dmp
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x00000000011a0d50, pid=1, tid=0x00007f92b5e8c700
> #
> # JRE version: OpenJDK Runtime Environment (8.0_242-b08) (build 
> 1.8.0_242-8u242-b08-0ubuntu3~18.04-b08)
> # Java VM: OpenJDK 64-Bit Server VM (25.242-b08 mixed mode linux-amd64 
> compressed oops)
> # Problematic frame:
> Wrote minidump to 
> /opt/impala/logs/minidumps/impalad/15727084-c931-49e1-62d37e86-75cfe0f6.dmp
> # C  [impalad+0xda0d50]  impala::FragmentInstanceState::GetRootSink() 
> const+0x0
> #
> # Core dump written. Default location: /opt/impala/core or core.1
> #
> # An error report file with more information is saved as:
> # /opt/impala/hs_err_pid1.log
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.java.com/bugreport/crash.jsp
> #
> {noformat}
> CC [~twm378]
> At a separate time I saw it trip the "Tried to add existing backend to 
> executor group" case in ExecutorGroup::AddExecutor().
> {noformat}
> >>void ExecutorGroup::AddExecutor(const BackendDescriptorPB& be_desc) {
>     // be_desc.is_executor can be false for the local backend when scheduling 
> queries to run
>     // on the coordinator host.
>     DCHECK(!be_desc.ip_address().empty());
>     Executors& be_descs = executor_map_[be_desc.ip_address()];
>     auto eq = [&be_desc](const BackendDescriptorPB& existing) {
>       // The IP addresses must already match, so it is sufficient to check 
> the port.
>       DCHECK_EQ(existing.ip_address(), be_desc.ip_address());
>       return existing.address().port() == be_desc.address().port();
>     };
>     if (find_if(be_descs.begin(), be_descs.end(), eq) != be_descs.end()) {
>       LOG(DFATAL) << "Tried to add existing backend to executor group: "
>                   << be_desc.krpc_address();
>       return;
>     }
>     if (!CheckConsistencyOrWarn(be_desc)) {
>       LOG(WARNING) << "Ignoring inconsistent backend for executor group: "
>                    << be_desc.krpc_address();
>       return;
>     }
>     if (be_descs.empty()) {
>       executor_ip_hash_ring_.AddNode(be_desc.ip_address());
>     }
>     be_descs.push_back(be_desc);
>     executor_ip_map_[be_desc.address().hostname()] = be_desc.ip_address();
>   }
> {noformat}
> I'm not sure if using the hostname to identify impalads is even useful at 
> this point,  we could probably simplify this by using IP address only.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (IMPALA-9788) Weird things happen when impalad restarts with different hostname but same IP

Reply via email to