[ 
https://issues.apache.org/jira/browse/IMPALA-7597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16629491#comment-16629491
 ] 

Vuk Ercegovac commented on IMPALA-7597:
---------------------------------------

The issue reported here is one example of InconsistentMetadataFetchException 
that can be thrown by code that is not under the retry loop of 
createExecRequest.

Working backwards, all of these a thrown from sendRequest in 
CatalogMetaProvider when fetching from catalogd and at catalogd, 1) not finding 
an expected object (e.g., database might have been deleted and now we're 
fetching its list of table names, which is no longer valid) or 2) finding that 
versions mismatch due to an interleaved write.

Such inconsistencies are possible at every step of the schema hierarchy, e.g., 
list dbs, get db info, list table names, load table, load table col stats, list 
partitions, load partition(s), list functions, load function.

With the push architecture ("v1"), many of these operations would succeed but 
with potentially stale data. For example, if the table is present locally, its 
partitions are also present, so "show partitions" would complete. With the pull 
architecture ("v2"), if a new partition is added or the table is dropped for 
example, after the table is cached but before the partitions are fetched, the 
change will be reported as an exception. While the exception reflects a more 
current state, such exceptions offer a different behavior than with "v1". With 
"v1", a stale result can be returned. A follow-up operation, for example 
listing the tables in a database for a database that was listed (via show 
databases) but since dropped would just result in an error stating that the 
database does not exist.

For queries, we chose to explicitly retry. An option here is to retry for all 
such operations. We can do so with a retrying wrapper with the same interface 
(similar to the hms retrying client). However, that may be too heavyweight an 
approach. For example, getCatalogMetrics (and its callers) should be able to 
proceed when such an exception arises-- its for internal book-keeping and can 
be skipped. An alternative is to provide a wrapper that retries and can easily 
be obtained-- first thought is to add something along side getCatalog in 
Frontend, e.g., getRetryableCatalog-- and to use it where needed. Further 
alternatives include making the exception checked, which was pointed out in a 
todo (along with it being viral). Another approach is to make v2's cache more 
coarse grained. For example, a database can include all its table names and 
functions (avoids the double check).

In addition, a way to test this is needed. Initial thought is to inject time 
delays and check that at least one such inconsistency is encountered and 
retried per operation.

> "show partitions" does not retry on InconsistentMetadataFetchException
> ----------------------------------------------------------------------
>
>                 Key: IMPALA-7597
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7597
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 3.1.0
>            Reporter: bharath v
>            Assignee: Vuk Ercegovac
>            Priority: Critical
>
> IMPALA-7530 added retries in case LocalCatalog throws 
> InconsistentMetadataFetchException. These retries apply to all code paths 
> taking {{Frontend#createExecRequest()}}. 
> "show partitions" additionally takes {{Frontend#getTableStats()} and aborts 
> the first time it sees InconsistentMetadataFetchException. 
> We need to make sure all the queries (especially DDLs) retry if they hit this 
> exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to