[ 
https://issues.apache.org/jira/browse/IMPALA-9045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16971198#comment-16971198
 ] 

ASF subversion and git services commented on IMPALA-9045:
---------------------------------------------------------

Commit 52c774d2ae9bb9d37a10f51fe5721d11a3ec7416 in impala's branch 
refs/heads/master from Zoltan Borok-Nagy
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=52c774d ]

IMPALA-9045: Filter base directories of open/aborted compactions

Base directories are in the format of base_<write_id>_<transaction_id>.
The <transaction_id> part helps to decide whether a base directory
is fully written, or is it still being written by a compaction job.
Compaction jobs don't increase the write id of a table, hence the
<write_id> part cannot be used for that.

Before this commit Impala didn't check the validity of <transaction_id>,
therefore it might read the contents of a half-written base directory.
With this change Impala retrieves the valid transaction list from HMS
and checks if <transaction_id> is committed.

Testing
 * Added an e2e test that simulates in-progress compactions
 * Added frontend test that filters based on custom valid txn list

Change-Id: Idb895df38bc075e4767e44a6887dbe3000a19ea6
Reviewed-on: http://gerrit.cloudera.org:8080/14547
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Filter base directories of open/aborted compactions
> ---------------------------------------------------
>
>                 Key: IMPALA-9045
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9045
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 3.3.0
>            Reporter: Csaba Ringhofer
>            Assignee: Zoltán Borók-Nagy
>            Priority: Critical
>              Labels: impala-acid
>
> Major compactions creates directories in base_writeid_visibilityTxnId, which 
> expresses that it contains all deltas +bases <= writeId, and that the 
> compaction's transaction is visibilityTxnId. visibilityTxnId is needed to 
> check whether the compaction is open/aborted/committed, and base directories 
> belonging to open/aborted compactions should be ignored.
> Currently Impala only checks the writeId, so if there is an open/aborted 
> compaction, it will be used as base, and base/delta directories with smaller 
> writeIds will be ignored, leading to potential data loss.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to