[
https://issues.apache.org/jira/browse/IMPALA-10801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17397860#comment-17397860
]
ASF subversion and git services commented on IMPALA-10801:
----------------------------------------------------------
Commit 494588b601d52e18acecc2b97128a2dc5bab6bc1 in impala's branch
refs/heads/master from Yu-Wen Lai
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=494588b ]
IMPALA-10801: Check the latest compaction Id before serving ACID table
Since compactions don't advance write id, we don't know if a
table/partition is compacted by comparing writeIdList. A possible
issue is that CatalogD provides obsolete file metadata and causes a
runtime error.
In order to fix this issue, we introduced a HMS API that can get the
latest compaction record for a table/partition (HIVE-24828). In
CatalogD, we compare the cached id with the latest compaction id before
serving. If there is a newer compaction happened, we will cache the
latest compaction id and refresh the file metadata.
Besides, this patch also change how to replace the existing table
after a table full reloading. The current way is to replace the table
if the catalog version is not changed. For transactional tables,
things get additional complexity given that file metadata refreshing
and full table reloading can happen together. We can actually use
writeIdList to determine whether we should replace the table for
transactional tables. As long as the updated table has more recent
writeIdList than the existing one, we are safe to replace the table.
For non-transactional tables, we still keep the original behavior.
Testing:
- Add several tests in PartialCatalogInfoWriteIdTest
Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Reviewed-on: http://gerrit.cloudera.org:8080/17697
Reviewed-by: Vihang Karajgaonkar <[email protected]>
Tested-by: Vihang Karajgaonkar <[email protected]>
> Check the latest compaction Id before serving request
> -----------------------------------------------------
>
> Key: IMPALA-10801
> URL: https://issues.apache.org/jira/browse/IMPALA-10801
> Project: IMPALA
> Issue Type: Improvement
> Components: Catalog
> Reporter: Yu-Wen Lai
> Assignee: Yu-Wen Lai
> Priority: Major
>
> Cache compaction Id for a given table/file-metadata in CatalogD.
> Whenever there is a read request to CatalogD, get the latest compaction event
> Id from HMS, compare it with what is cached in CatalogD, and based on that
> decide whether to serve the data from cache or to refresh it from the
> filesystem. This can avoid notification based cache invalidation.
> Also, since there will be an open txn for the current long running query
> which is being served from CatalogD, we can be sure that current
> file-metadata being served is not already deleted by the cleaner.
> This proposal will use a new HMS APIĀ
> (https://issues.apache.org/jira/browse/HIVE-24828) to get the latest
> compaction id for a table.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]