[jira] [Commented] (IMPALA-10801) Check the latest compaction Id before serving request

ASF subversion and git services (Jira) Thu, 02 Dec 2021 18:22:25 -0800


    [ 
https://issues.apache.org/jira/browse/IMPALA-10801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17452699#comment-17452699
 ]


ASF subversion and git services commented on IMPALA-10801:
----------------------------------------------------------

Commit 4077bc849ae14bb92a463aeeb6c8f5c1fca658c9 in impala's branch 
refs/heads/master from Yu-Wen Lai
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=4077bc8 ]

IMPALA-11032: Automatic Refresh of Metadata for Local Catalog after
Compaction

After compaction happened in Hive(HIVE ACID table), queries made in
Impala possibly fail with a FileNotFoundException if files already
removed by the Hive cleaner.

In IMPALA-10801, catalogd checks the latest compaction id before serving
metadata. However, coordinators don't take advantage of that.
Coordinators have their own local cache, so we will have to do the
same check for coordinators as well. Besides, we also need to attach
writeIdList to requests that need to fetch file metadata. Since this
checking brings additional overhead for queries, we introduce a flag
auto_check_compaction and set it as false by default for now. We will
find some other efficient ways to do compaction checking in the future.

Tests:
Added unit tests to CatalogdMetaProviderTest

Change-Id: I173ea848917b6a41139b25b80677111463bfdc4b
Reviewed-on: http://gerrit.cloudera.org:8080/18043
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Check the latest compaction Id before serving request
> -----------------------------------------------------
>
>                 Key: IMPALA-10801
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10801
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Catalog
>            Reporter: Yu-Wen Lai
>            Assignee: Yu-Wen Lai
>            Priority: Major
>
> Cache compaction Id for a given table/file-metadata in CatalogD.
> Whenever there is a read request to CatalogD, get the latest compaction event 
> Id from HMS, compare it with what is cached in CatalogD, and based on that 
> decide whether to serve the data from cache or to refresh it from the 
> filesystem. This can avoid notification based cache invalidation.
> Also, since there will be an open txn for the current long running query 
> which is being served from CatalogD, we can be sure that current 
> file-metadata being served is not already deleted by the cleaner.
> This proposal will use a new HMS API 
> (https://issues.apache.org/jira/browse/HIVE-24828) to get the latest 
> compaction id for a table.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (IMPALA-10801) Check the latest compaction Id before serving request

Reply via email to