[
https://issues.apache.org/jira/browse/IMPALA-9045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16971198#comment-16971198
]
ASF subversion and git services commented on IMPALA-9045:
---------------------------------------------------------
Commit 52c774d2ae9bb9d37a10f51fe5721d11a3ec7416 in impala's branch
refs/heads/master from Zoltan Borok-Nagy
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=52c774d ]
IMPALA-9045: Filter base directories of open/aborted compactions
Base directories are in the format of base_<write_id>_<transaction_id>.
The <transaction_id> part helps to decide whether a base directory
is fully written, or is it still being written by a compaction job.
Compaction jobs don't increase the write id of a table, hence the
<write_id> part cannot be used for that.
Before this commit Impala didn't check the validity of <transaction_id>,
therefore it might read the contents of a half-written base directory.
With this change Impala retrieves the valid transaction list from HMS
and checks if <transaction_id> is committed.
Testing
* Added an e2e test that simulates in-progress compactions
* Added frontend test that filters based on custom valid txn list
Change-Id: Idb895df38bc075e4767e44a6887dbe3000a19ea6
Reviewed-on: http://gerrit.cloudera.org:8080/14547
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Filter base directories of open/aborted compactions
> ---------------------------------------------------
>
> Key: IMPALA-9045
> URL: https://issues.apache.org/jira/browse/IMPALA-9045
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Affects Versions: Impala 3.3.0
> Reporter: Csaba Ringhofer
> Assignee: Zoltán Borók-Nagy
> Priority: Critical
> Labels: impala-acid
>
> Major compactions creates directories in base_writeid_visibilityTxnId, which
> expresses that it contains all deltas +bases <= writeId, and that the
> compaction's transaction is visibilityTxnId. visibilityTxnId is needed to
> check whether the compaction is open/aborted/committed, and base directories
> belonging to open/aborted compactions should be ignored.
> Currently Impala only checks the writeId, so if there is an open/aborted
> compaction, it will be used as base, and base/delta directories with smaller
> writeIds will be ignored, leading to potential data loss.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]