[
https://issues.apache.org/jira/browse/HIVE-20699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eugene Koifman updated HIVE-20699:
----------------------------------
Description:
Currently the Acid compactor is implemented as generated MR job
({{CompactorMR}}.
It could also be expressed as a Hive query that reads from a given partition
and writes data back to the same partition. This will merge the deltas and
'apply' the delete events. The simplest would be to just use Insert Overwrite
but that will change all ROW__IDs which we don't want.
Need to implement this in a way that preserves ROW__IDs and creates a new
{{base_x}} directory to handle Major compaction.
Minor compaction will be investigated separately.
> Query based compactor for full CRUD Acid tables
> -----------------------------------------------
>
> Key: HIVE-20699
> URL: https://issues.apache.org/jira/browse/HIVE-20699
> Project: Hive
> Issue Type: New Feature
> Components: Transactions
> Affects Versions: 3.1.0
> Reporter: Eugene Koifman
> Assignee: Eugene Koifman
> Priority: Major
>
> Currently the Acid compactor is implemented as generated MR job
> ({{CompactorMR}}.
> It could also be expressed as a Hive query that reads from a given partition
> and writes data back to the same partition. This will merge the deltas and
> 'apply' the delete events. The simplest would be to just use Insert
> Overwrite but that will change all ROW__IDs which we don't want.
> Need to implement this in a way that preserves ROW__IDs and creates a new
> {{base_x}} directory to handle Major compaction.
> Minor compaction will be investigated separately.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)