[
https://issues.apache.org/jira/browse/HIVE-18814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eugene Koifman updated HIVE-18814:
----------------------------------
Attachment: HIVE-18814.03.patch
> Support Add Partition For Acid tables
> -------------------------------------
>
> Key: HIVE-18814
> URL: https://issues.apache.org/jira/browse/HIVE-18814
> Project: Hive
> Issue Type: New Feature
> Components: Transactions
> Reporter: Eugene Koifman
> Assignee: Eugene Koifman
> Priority: Major
> Attachments: HIVE-18814.01.patch, HIVE-18814.02.patch,
> HIVE-18814.03.patch
>
>
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual%2BDDL#LanguageManualDDL-AddPartitions]
> Add Partition command creates a {{Partition}} metadata object and sets the
> location to the directory containing data files.
> In current master (Hive 3.0), Add partition on an acid table doesn't fail and
> at read time the data is decorated with row__id but the original transaction
> is 0. I suspect in earlier Hive versions this will throw or return no data.
> Since this new partition didn't have data before, assigning txnid:0 isn't
> going to generate duplicate IDs but it could violate Snapshot Isolation in
> multi stmt txns. Suppose txnid:7 runs {{select * from T}}. Then txnid:8
> adds a partition to T. Now if txnid:7 runs the same query again, it will see
> the data in the new partition.
> This can't be release like this since a delete on this data (added via Add
> partition) will use row_ids with txnid:0 so a later upgrade that sees
> un-compacted may generate row_ids with different txnid (assuming this is
> fixed by then)
>
> One option is follow Load Data approach and create a new delta_x_x/ and
> move/copy the data there.
>
> Another is to allocate a new writeid and save it in Partition metadata. This
> could then be used to decorate data with ROW__IDs. This avoids move/copy but
> retains data "outside" of the table tree which make it more likely that this
> data will be modified in some way which can really break things if done after
> and SQL update/delete on this data have happened.
>
> It performs no validations on add (except for partition spec) so any file
> with any format can be added. It allows add to bucketed tables as well.
> Seems like a very dangerous command. Maybe a better option is to block it
> and advise using Load Data. Alternatively, make this do Add partition
> metadata op followed by Load Data.
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)