[ 
https://issues.apache.org/jira/browse/HIVE-21052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17152785#comment-17152785
 ] 

Amogh Margoor commented on HIVE-21052:
--------------------------------------

We have created Spark Datasource that can read/write into Hive ACID tables: 
[https://github.com/qubole/spark-acid/]. Folks using it are already hitting 
this particular issue with Dynamic Partitions for Hive metastore service 3.1.1. 
To reiterate issue being faced: No entry is made in TXN_COMPONENT for dynamic 
partitions in `TxnHandler.enqueueLockWithRetry` as it waits for 
addDynamicPartition being called. If application errors out before calling 
addDynamicPartition, there would be files written by aborted transaction which 
should not be read. But cleaner task cleans the transaction from metastore as 
it doesnot have entry in TXN_COMPONENT but does not clean up it's files. So all 
the reads that point onwards have risk of reading those files written by 
aborted and cleaned up transaction.

Is there any workaround for this or any plan for fixing this ?

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -------------------------------------------------------------------------------------
>
>                 Key: HIVE-21052
>                 URL: https://issues.apache.org/jira/browse/HIVE-21052
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>    Affects Versions: 3.0.0, 3.1.1
>            Reporter: Jaume M
>            Assignee: Jaume M
>            Priority: Critical
>         Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to