[
https://issues.apache.org/jira/browse/IMPALA-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16572224#comment-16572224
]
Balazs Jeszenszky commented on IMPALA-7168:
-------------------------------------------
To rephrase, the issue here is that the new subscriber upon joining will have a
catalog version of 0, which gets propagated as the minimum topic version for
the catalog topic. If the new subscriber fails to process the initial update
and keeps re-requesting it (locking its catalog version at 0), SYNC_DDL queries
will hang.
Without having an initial catalog update processed, the coordinator will not
serve any queries, and so its metadata staleness isn't relevant for the
purposes of SYNC_DDL. Maybe it's enough to just ignore 0 values for minimum
subscriber topic version?
> DML query may hang if CatalogUpdateCallback() encounters repeated error
> -----------------------------------------------------------------------
>
> Key: IMPALA-7168
> URL: https://issues.apache.org/jira/browse/IMPALA-7168
> Project: IMPALA
> Issue Type: Bug
> Components: Catalog
> Affects Versions: Impala 2.9.0, Impala 2.10.0, Impala 2.11.0, Impala 3.0,
> Impala 2.12.0
> Reporter: Pranay Singh
> Priority: Major
>
> DML queries or INSERT will encounter a hang, if
> exec_env_->frontend()->UpdateCatalogCache() in
> ImpalaServer::CatalogUpdateCallback encounters repeated error like ENOMEM.
> This happens with SYNC_DDL set to 1 when the coordinator node is waiting for
> it's catalog version to become current.
> The scenario shows up like this, lets say there are two coordinator nodes ,
> Node A, Node B
> and catalogd and statestored are running on Node C.
> a) CREATE TABLE is executed on Node A, with SYNC_DDL set to 1, the thread
> running the query is going to block in
> impala::ImpalaServer::ProcessCatalogUpdateResult(), waiting for it's catalog
> version to become current.
> b) Meanwhile statestored running on Node C would call
> ImpalaServer::CatalogUpdateCallback on Node B via thrift RPC to do a delta
> topic update, which would not happen if we encounter repeated errors, say
> front end is low on memory (low JVM heap situation).
> c) In such case Node A will wait indefinitely waiting for it's catalog
> version to become current, till Node B is shutdown voluntarily.
> Note: This is a case where Node B is reachable (hearbeat is fine, but node is
> in a bad state, non working).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]