[
https://issues.apache.org/jira/browse/HIVE-24266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ádám Szita reassigned HIVE-24266:
---------------------------------
> Committed rows in hflush'd ACID files may be missing from query result
> ----------------------------------------------------------------------
>
> Key: HIVE-24266
> URL: https://issues.apache.org/jira/browse/HIVE-24266
> Project: Hive
> Issue Type: Bug
> Reporter: Ádám Szita
> Assignee: Ádám Szita
> Priority: Major
>
> in HDFS environment if a writer is using hflush to write ORC ACID files
> during a transaction commit, the results might be seen as missing when
> reading the table before this file is completely persisted to disk (thus
> synced)
> This is due to hflush not persisting the new buffers to disk, it rather just
> ensures that new readers can see the new content. This causes the block
> information to be incomplete, on which BISplitStrategy relies on. Although
> the side file (_flush_length) tracks the proper end of the file that is being
> written, this information is neglected in the favour of block information,
> and we may end up generating a very short split instead of the larger,
> available length.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)