[jira] [Commented] (IMPALA-12295) Statestore crashed when restarting catalogd

ASF subversion and git services (Jira) Tue, 18 Jul 2023 20:41:15 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-12295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17744426#comment-17744426
 ]


ASF subversion and git services commented on IMPALA-12295:
----------------------------------------------------------

Commit 97e44c11923f3d28e08aba1b5dd66b8a35465deb in impala's branch 
refs/heads/master from wzhou-code
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=97e44c119 ]

IMPALA-12295: Statestore crashed when restarting catalogd

Statestore hit DCHECK when re-registering catalogd when CatalogD
HA is not enabled. The number of catalogd should not be increased
when re-registering catalogd.
The issue could be re-produced for unit-test case
test_restart_services.py::TestRestart::test_restart_catalog with
increased value for statestore_heartbeat_frequency_ms.

Testing:
 - Verified the issue does not happen for unit-test case
   test_restart_services.py::TestRestart::test_restart_catalog.
 - Passed core test.

Change-Id: I031f0c6d895601e7ea8b15005a3ad52bd3254e7c
Reviewed-on: http://gerrit.cloudera.org:8080/20217
Tested-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Wenzhe Zhou <[email protected]>


> Statestore crashed when restarting catalogd
> -------------------------------------------
>
>                 Key: IMPALA-12295
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12295
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>            Reporter: Quanlong Huang
>            Assignee: Wenzhe Zhou
>            Priority: Critical
>
> I restart catalogd in my dev env and see statestore crashed.
> {noformat}
> $ bin/start-impala-cluster.py --restart_catalogd_only 
> 19:39:44 MainThread: Found 3 impalad/1 statestored/0 catalogd process(es)
> 19:39:44 MainThread: Starting Catalog Service logging to 
> /home/quanlong/workspace/Impala/logs/cluster/catalogd.INFO
> 19:39:47 MainThread: Found 3 impalad/0 statestored/1 catalogd process(es)
> 19:39:48 MainThread: Found 3 impalad/0 statestored/1 catalogd process(es)
> 19:39:49 MainThread: Found 3 impalad/0 statestored/1 catalogd process(es)
> 19:39:50 MainThread: Found 3 impalad/0 statestored/1 catalogd process(es)
> 19:39:51 MainThread: Found 3 impalad/0 statestored/1 catalogd process(es)
> 19:39:53 MainThread: Found 3 impalad/0 statestored/1 catalogd process(es)
> 19:39:54 MainThread: Found 3 impalad/0 statestored/1 catalogd process(es)
> 19:39:55 MainThread: Found 3 impalad/0 statestored/1 catalogd process(es)
> 19:39:56 MainThread: Found 3 impalad/0 statestored/1 catalogd process(es)
> 19:39:57 MainThread: Found 3 impalad/0 statestored/1 catalogd process(es)
> 19:39:58 MainThread: Found 3 impalad/0 statestored/1 catalogd process(es)
> 19:39:58 MainThread: Error starting cluster
> Traceback (most recent call last):
>   File "bin/start-impala-cluster.py", line 930, in <module>
>     expected_cluster_size - expected_catalog_delays)
>   File "/home/quanlong/workspace/Impala/tests/common/impala_cluster.py", line 
> 185, in wait_until_ready
>     self.wait_for_num_impalads(expected_num_impalads)
>   File "/home/quanlong/workspace/Impala/tests/common/impala_cluster.py", line 
> 231, in wait_for_num_impalads
>     raise RuntimeError(msg)
> RuntimeError: statestored failed to start. {noformat}
> Check statestored.ERROR and see the DCHECK failure:
> {noformat}
> F0718 19:39:47.096524 11460 statestore-catalogd-mgr.cc:58] Check failed: 
> num_registered_catalogd_ < 2 
> *** Check failure stack trace: *** 
>     @          0x38371ed  google::LogMessage::Fail()
>     @          0x3839124  google::LogMessage::SendToLog()
>     @          0x3836bcc  google::LogMessage::Flush()
>     @          0x3839649  google::LogMessageFatal::~LogMessageFatal()
>     @          0x17c3540  impala::StatestoreCatalogdMgr::RegisterCatalogd()
>     @          0x17a32fe  impala::Statestore::RegisterSubscriber()
>     @          0x17c27ed  StatestoreThriftIf::RegisterSubscriber()
>     @          0x17c0a92  
> impala::StatestoreServiceProcessorT<>::process_RegisterSubscriber()
>     @          0x17c30b3  
> impala::StatestoreServiceProcessorT<>::dispatchCall()
>     @           0xee68df  apache::thrift::TDispatchProcessor::process()
>     @          0x131e224  
> apache::thrift::server::TAcceptQueueServer::Task::run()
>     @          0x130ab89  impala::ThriftThread::RunRunnable()
>     @          0x130c7b1  
> boost::detail::function::void_function_obj_invoker0<>::invoke()
>     @          0x1930e30  impala::Thread::SuperviseThread()
>     @          0x1931c39  boost::detail::thread_data<>::run()
>     @          0x2359067  thread_proxy
>     @     0x7f5cbef5a6db  start_thread
>     @     0x7f5cbbcc061f  clone{noformat}
> The cluster runs without catalogd HA.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (IMPALA-12295) Statestore crashed when restarting catalogd

Reply via email to