[
https://issues.apache.org/jira/browse/HIVE-22413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hocheol Park updated HIVE-22413:
--------------------------------
Attachment: HIVE-22413.1.patch
> Avoid dirty read when reading the ACID table while compaction is running
> ------------------------------------------------------------------------
>
> Key: HIVE-22413
> URL: https://issues.apache.org/jira/browse/HIVE-22413
> Project: Hive
> Issue Type: Bug
> Components: Transactions
> Reporter: Hocheol Park
> Priority: Major
> Attachments: HIVE-22413.1.patch
>
>
> There is a problem that dirty read occurs when reading the ACID table while
> base or delta directories are being created by the compactor. Especially it
> is highly likely to occur in the S3 storage because the “move” logic of S3 is
> “copy and delete”, and it takes a long time to copy if the size of files are
> large or bucketing count is large.
> So here’s the logic to avoid this problem. If “_tmp” prefixed directories are
> existed in the partition directory on the process of listing the child
> directories when reading the ACID table, compare the names of the directory
> in the “_tmp” one and skip it in case of the same. Then it will read the
> files before merging, no difference on the results.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)