[
https://issues.apache.org/jira/browse/IMPALA-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joe McDonnell resolved IMPALA-9549.
-----------------------------------
Fix Version/s: Impala 4.0
Target Version: Impala 4.0
Resolution: Fixed
> Impalad startup fails to wait for catalogd to startup when using local catalog
> ------------------------------------------------------------------------------
>
> Key: IMPALA-9549
> URL: https://issues.apache.org/jira/browse/IMPALA-9549
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 4.0
> Reporter: Joe McDonnell
> Assignee: Joe McDonnell
> Priority: Critical
> Fix For: Impala 4.0
>
>
> Since Impala coordinators and executors may be starting up at the same time
> as the catalogd, they should be tolerant of delays in the catalogd starting
> up. When using local catalog (use_local_catalog=true), the Impalads fail with
> the following error if the catalogd startup is delayed:
> {noformat}
> I0323 14:22:03.151849 29565 jni-util.cc:288]
> org.apache.impala.catalog.local.LocalCatalogException: Unable to load
> database names
> I0323 14:22:03.151849 29565 jni-util.cc:288]
> org.apache.impala.catalog.local.LocalCatalogException: Unable to load
> database names
> at org.apache.impala.catalog.local.LocalCatalog.loadDbs(LocalCatalog.java:94)
> at org.apache.impala.catalog.local.LocalCatalog.getDbs(LocalCatalog.java:83)
> at org.apache.impala.service.Frontend.getCatalogMetrics(Frontend.java:753)
> at
> org.apache.impala.service.JniFrontend.getCatalogMetrics(JniFrontend.java:220)
> Caused by: org.apache.thrift.TException:
> org.apache.impala.common.InternalException: Couldn't open transport for
> localhost:26000 (connect() failed: Connection refused)
> at
> org.apache.impala.catalog.local.CatalogdMetaProvider.sendRequest(CatalogdMetaProvider.java:382)
> at
> org.apache.impala.catalog.local.CatalogdMetaProvider.access$100(CatalogdMetaProvider.java:174)
> at
> org.apache.impala.catalog.local.CatalogdMetaProvider$1.call(CatalogdMetaProvider.java:583)
> at
> org.apache.impala.catalog.local.CatalogdMetaProvider$1.call(CatalogdMetaProvider.java:578)
> at
> org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:509)
> at
> org.apache.impala.catalog.local.CatalogdMetaProvider.loadDbList(CatalogdMetaProvider.java:577)
> at org.apache.impala.catalog.local.LocalCatalog.loadDbs(LocalCatalog.java:92)
> ... 3 more
> Caused by: org.apache.impala.common.InternalException: Couldn't open
> transport for localhost:26000 (connect() failed: Connection refused)
> at org.apache.impala.service.FeSupport.NativeGetPartialCatalogObject(Native
> Method)
> at
> org.apache.impala.service.FeSupport.GetPartialCatalogObject(FeSupport.java:440)
> at
> org.apache.impala.catalog.local.CatalogdMetaProvider.sendRequest(CatalogdMetaProvider.java:380)
> ... 9 more
> I0323 14:22:03.217051 29565 status.cc:126] LocalCatalogException: Unable to
> load database names
> CAUSED BY: TException: org.apache.impala.common.InternalException: Couldn't
> open transport for localhost:26000 (connect() failed: Connection
> refused){noformat}
> What happens is that the ImpalaServer constructor calls
> ImpalaServer::UpdateCatalogMetrics()
> ([https://github.com/apache/impala/blob/3b833902519fb8f0ef9b5fd20919c5fd85d22fcf/be/src/service/impala-server.cc#L452]
> ). UpdateCatalogMetrics() is maintaining two metrics that track the number
> of databases and the number of tables. This ends up calling
> org.apache.impala.catalog.local.LocalCatalog.getDbs(), which calls loadDbs()
> ([https://github.com/apache/impala/blob/ca0785ec206f27f06d8d6fd1b710779e548bbd8e/fe/src/main/java/org/apache/impala/catalog/local/LocalCatalog.java#L83]
> ). loadDbs() requires a connection to catalogd and will fail if it cannot
> connect.
> Importantly, this all happens before waiting for the catalogd to start up in
> the regular ImpalaServer::Start():
> {code:java}
> if (FLAGS_is_coordinator) exec_env_->frontend()->WaitForCatalog();
> {code}
>
> In the old catalog implementation (use_local_catalog=false), the getDbs()
> call on the catalog returns whatever values it has, and it does not try to
> contact the catalogd. This is why the regular case does not see this problem.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]