[
https://issues.apache.org/jira/browse/IMPALA-8406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827133#comment-16827133
]
ASF subversion and git services commented on IMPALA-8406:
---------------------------------------------------------
Commit 114712031f2293af5b3d9509776f88c23d3fa0fc in impala's branch
refs/heads/master from Todd Lipcon
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=1147120 ]
IMPALA-8454 (part 1): Refactor file descriptor loading code
This refactors various file-descriptor loading code out of HdfsTable
into new standalone classes. In order to support ACID tables, we'll need
to make various changes to these bits of code, and having them extracted
and cleaned up will make that easier.
This consolidates all of the places in which we list partition
directories into one method which does the appropriate thing regardless
of situation.
This has a small behavior change related to IMPALA-8406: previously, we
had a bug where, while refreshing a table, if one or more partitions
failed to refresh, the other partitions might still get refreshed
despite an error being returned. Those other partitions wouldn't be
available immediately until some other operation caused the table's
catalog version number to increase. This was buggy behavior.
Rather than tackle that problem in this "refactor" patch, this patch
just slightly improves the behavior: we'll either atomically update or
not update all partitions, but we might still add new partitions noticed
by the REFRESH, and might still update other HMS metadata.
This patch may end up slightly improving various other code paths that
refresh file descriptor lists. We used to have slightly different ways
of doing this in three different places, with different sets of
optimizations. Now we do it all in one place, and we pull all the known
tricks.
Change-Id: I59edf493b9ba38be5f556b4795a7684d9c9e3a07
Reviewed-on: http://gerrit.cloudera.org:8080/12950
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Failed REFRESH can partially modify table without bumping version number
> ------------------------------------------------------------------------
>
> Key: IMPALA-8406
> URL: https://issues.apache.org/jira/browse/IMPALA-8406
> Project: IMPALA
> Issue Type: Bug
> Components: Catalog
> Affects Versions: Impala 3.2.0
> Reporter: Todd Lipcon
> Priority: Major
>
> Currently, various incremental operations in the catalogd modify Table
> objects in place, including REFRESH, which modifies each partition. In this
> case, if one partition fails to refresh (eg due to incorrect partitions or
> some other file access problem), other partitions can still be modified,
> either because they were modified first (in a non-parallel operation) or
> modified in parallel (for REFRESH).
> In this case, the REFRESH operation will throw an Exception back to the user,
> but in fact it has modified the catalog entry. The version number, however,
> is not bumped, which breaks some invariants of the catalog that an object
> doesn't change without changing version numbers.
> This also produces some unexpected behavior such as:
> - SHOW FILES IN t;
> - REFRESH t; -- gets a failure
> - SHOW FILES in t; -- see the same result as originally
> - ALTER TABLE t SET UNCACHED; -- bumps the version number due to unrelated
> operation
> - SHOW FILES IN t; -- the set of files has changed due to the earlier
> partially-complete REFRESH
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]