[ https://issues.apache.org/jira/browse/IMPALA-13850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17950616#comment-17950616 ]
ASF subversion and git services commented on IMPALA-13850: ---------------------------------------------------------- Commit be16a02fa8f98da09d572d8250363896ae10b9e7 in impala's branch refs/heads/master from Riza Suminto [ https://gitbox.apache.org/repos/asf?p=impala.git;h=be16a02fa ] IMPALA-13850 (part 2): Fix bug found by test_restart_services.py This patch stabilize test_restart_catalogd_with_local_catalog in test_restart_services.py after the first part of IMPALA-13850 merged. IMPALA-13850 (part 1) make local catalog mode send statestore update twice: the first is to announce its availability and service id, while the second is the full topic update. There is a slight duration where CatalogD accept getCatalogObject() request before the very first CatalogServiceCatalog.reset() initiated and obtain write lock. When such request went through, the request might see an empty catalog which results in query failures of db/table not exists. This patch block CatalogServiceThriftIf.AcceptRequest() until CatalogServiceCatalog.reset() initiated. Catalog version 100 is used to signal that initial reset has begun. Later in part 3, when we implement in-place metadata cache reset, AcceptRequest() can unblock faster when reset() release the write lock in-between catalog cache initialization. Testing: - Loop and pass test_restart_catalogd_with_local_catalog 100 times. Change-Id: I97f6f692506de0bbf2e1445f83bed824dc8298fd Reviewed-on: http://gerrit.cloudera.org:8080/22844 Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> > Catalogd should not start metadata operation until initialization is done if > HA is enabled > ------------------------------------------------------------------------------------------ > > Key: IMPALA-13850 > URL: https://issues.apache.org/jira/browse/IMPALA-13850 > Project: IMPALA > Issue Type: Bug > Components: Catalog > Reporter: Wenzhe Zhou > Assignee: Riza Suminto > Priority: Critical > > In a case reported by user, the catalogd initialization failed to complete. > Log messages showed that catalog HA was enabled. catalogd was blocked when > trying to acquire "CatalogServer.catalog_lock_" when calling > CatalogServer::UpdateActiveCatalogd() during statestore subscriber > registration. > Log message showed that there was IM command issued before catalogd tried to > register to statestore. > {code:java} > I0310 12:21:34.093617 1 CatalogServiceCatalog.java:2188] Invalidated all > metadata. > I0310 12:21:34.094341 1 thrift-server.cc:419] ThriftServer > 'StatestoreSubscriber' started on port: 23020 > I0310 12:21:34.094341 1816 TAcceptQueueServer.cpp:329] > connection_setup_thread_pool_size is set to 2 > I0310 12:21:34.094586 1 thrift-util.cc:198] TSocket::open() error on > socket (after THRIFT_POLL) <Host: localhost Port: 23020>: Connection refused > I0310 12:21:34.094790 1 statestore-subscriber.cc:745] Starting statestore > subscriber > {code} > We should not allow any metadata operation until initialization is done. When > HA is enabled, catalog-server should not hold "CatalogServer.catalog_lock_" > for long time before active catalogd is assigned. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org