Tim Armstrong created IMPALA-9788:
-------------------------------------

             Summary: Weird things happen when impalad restarts with different 
hostname but same IP
                 Key: IMPALA-9788
                 URL: https://issues.apache.org/jira/browse/IMPALA-9788
             Project: IMPALA
          Issue Type: Bug
          Components: Backend
    Affects Versions: Impala 3.4.0
            Reporter: Tim Armstrong
            Assignee: Sahil Takiar
         Attachments: get-root-sink-resolved.txt

I was messing around with running impala in a single-node dockerized 
configuration and ran into a bunch of weirdness stemming when I restarted the 
impalad. It got into a state where where was a new and old statestore 
registration with the same IP/port and different hostnames (since docker 
generates new hostnames for each incarnation of the container).

I saw a crash in Coordinator::GetRootSink(). The cause of that is the 
coordinator treating the same impalad as two distinct backends, and sending two 
execute RPCs to the backend (this is a single node cluster).

{noformat}
I0528 17:32:41.760128   573 coordinator.cc:143] 
f84b158b036445ad:3a9defdf00000000] Exec() 
query_id=f84b158b036445ad:3a9defdf00000000 stmt=SELECT COUNT(*) FROM 
tpcds_kudu.call_center
I0528 17:32:41.760670   573 coordinator.cc:463] 
f84b158b036445ad:3a9defdf00000000] starting execution on 2 backends for 
query_id=f84b158b036445ad:3a9defdf00000000
..
I0528 17:32:41.762449    78 control-service.cc:153] 
f84b158b036445ad:3a9defdf00000000] ExecQueryFInstances(): 
query_id=f84b158b036445ad:3a9defdf00000000 coord=a16ac03fc53b:22000 #instances=1
I0528 17:32:41.761706    79 control-service.cc:153] 
f84b158b036445ad:3a9defdf00000000] ExecQueryFInstances(): 
query_id=f84b158b036445ad:3a9defdf00000000 coord=a16ac03fc53b:22000 #instances=4
..
Wrote minidump to 
/opt/impala/logs/minidumps/impalad/15727084-c931-49e1-62d37e86-75cfe0f6.dmp
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00000000011a0d50, pid=1, tid=0x00007f92b5e8c700
#
# JRE version: OpenJDK Runtime Environment (8.0_242-b08) (build 
1.8.0_242-8u242-b08-0ubuntu3~18.04-b08)
# Java VM: OpenJDK 64-Bit Server VM (25.242-b08 mixed mode linux-amd64 
compressed oops)
# Problematic frame:
Wrote minidump to 
/opt/impala/logs/minidumps/impalad/15727084-c931-49e1-62d37e86-75cfe0f6.dmp
# C  [impalad+0xda0d50]  impala::FragmentInstanceState::GetRootSink() const+0x0
#
# Core dump written. Default location: /opt/impala/core or core.1
#
# An error report file with more information is saved as:
# /opt/impala/hs_err_pid1.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#

[error occurred during error reporting , id 0xb]
{noformat}
CC [~twm378]


At a separate time I saw it trip the "Tried to add existing backend to executor 
group" case in ExecutorGroup::AddExecutor().
{Noformat}
>>void ExecutorGroup::AddExecutor(const BackendDescriptorPB& be_desc) {
    // be_desc.is_executor can be false for the local backend when scheduling 
queries to run
    // on the coordinator host.
    DCHECK(!be_desc.ip_address().empty());
    Executors& be_descs = executor_map_[be_desc.ip_address()];
    auto eq = [&be_desc](const BackendDescriptorPB& existing) {
      // The IP addresses must already match, so it is sufficient to check the 
port.
      DCHECK_EQ(existing.ip_address(), be_desc.ip_address());
      return existing.address().port() == be_desc.address().port();
    };
    if (find_if(be_descs.begin(), be_descs.end(), eq) != be_descs.end()) {
      LOG(DFATAL) << "Tried to add existing backend to executor group: "
                  << be_desc.krpc_address();
      return;
    }
    if (!CheckConsistencyOrWarn(be_desc)) {
      LOG(WARNING) << "Ignoring inconsistent backend for executor group: "
                   << be_desc.krpc_address();
      return;
    }
    if (be_descs.empty()) {
      executor_ip_hash_ring_.AddNode(be_desc.ip_address());
    }
    be_descs.push_back(be_desc);
    executor_ip_map_[be_desc.address().hostname()] = be_desc.ip_address();
  }
{noformat}

I'm not sure if using the hostname to identify impalads is even useful at this 
point,  we could probably simplify this by using IP address only.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to