[
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Saket Saurabh updated HIVE-14035:
---------------------------------
Attachment: HIVE-14035.14.patch
Patch #14 significantly refactors the way split strategies are chosen for ACID
split-update case and now correctly sets the isOriginal flag on a per split
basis. When split-update is enabled, a split on base file can be of three
types: split on an original_base, split on an compacted_base, & split on an
insert_delta. It is possible that we might end up with a set of OrcSplits that
splits both original and insert_delta in same job. In such cases, it is very
important that we set the isOriginal flag correctly, otherwise it will mess up
the way split strategies are used to instantiate a number of things. This patch
takes care of that.
Additionally, the patch now also optimizes for the case when we had to process
uncovered buckets when the split had no base (possible previously when we had
only deltas). Now when split-update is enabled, every split will have a base,
because there is no point of having a split that is supposed to just read the
delete_deltas. (Minor compaction is not a concern here because minor compaction
always creates a single split and has a separate logic of doing that, and that
has not been modified.)
Tests for all these changes are added to TestInputOutputFormat for various
scenarios. Also addresses comments at RB.
> Enable predicate pushdown to delta files created by ACID Transactions
> ---------------------------------------------------------------------
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
> Issue Type: New Feature
> Components: Transactions
> Reporter: Saket Saurabh
> Assignee: Saket Saurabh
> Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch,
> HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch,
> HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch,
> HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch,
> HIVE-14035.13.patch, HIVE-14035.14.patch, HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not
> allow predicate pushdown if they contain any update/delete events. This is
> done to preserve correctness when following a multi-version approach during
> event collapsing, where an update event overwrites an existing insert event.
> This JIRA proposes to split an update event into a combination of a delete
> event followed by a new insert event, that can enable predicate push down to
> all delta files without breaking correctness. To support backward
> compatibility for this feature, this JIRA also proposes to add some sort of
> versioning to ACID that can allow different versions of ACID transactions to
> co-exist together.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)