[
https://issues.apache.org/jira/browse/HIVE-18098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eugene Koifman reassigned HIVE-18098:
-------------------------------------
Assignee: Eugene Koifman
> Add support for Export/Import for Acid tables
> ---------------------------------------------
>
> Key: HIVE-18098
> URL: https://issues.apache.org/jira/browse/HIVE-18098
> Project: Hive
> Issue Type: New Feature
> Components: Transactions
> Reporter: Eugene Koifman
> Assignee: Eugene Koifman
>
> How should this work?
> For regular tables export just copies the files under table root to a
> specified directory.
> This doesn't make sense for Acid tables:
> * Some data may belong to aborted transactons
> * Transaction IDs are imbedded into data/files names. You'd have export
> delta/ and base/ each of which may have files with the same names, e.g.
> bucket_00000.
> * On import these IDs won't make sense in a different cluster or even a
> different table (which may have delta_x_x for example for the same x (but
> different data of course).
> * Export creates a _metadata column types, storage format, etc. Perhaps it
> can include info about aborted IDs (if the whole file can't be skipped).
> * Even importing into the same table on the same cluster may be a problem.
> For example delta_5_5/ existed at the time of export and was included in the
> export. But 2 days later it may not exist because it was compacted and
> cleaned.
> * If importing back into the same table on the same cluster, the data could
> be imported into a different transaction (assuming per table writeIDs) w/o
> having to remap the IDs in the rows themselves.
> * support Import Overwrite?
> * Support Import as a new txn with remapping of ROW_IDs? The new writeID can
> be stored in a delta_x_x/_meta_data and ROW__IDs can be remapped at read time
> (like isOriginal) and made permanent by compaction.
> * It doesn't seem reasonable to import acid data into non-acid table
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)