[jira] [Resolved] (HIVE-24266) Committed rows in hflush'd ACID files may be missing from query result

Jira Wed, 14 Oct 2020 12:23:01 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-24266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ádám Szita resolved HIVE-24266.
-------------------------------
    Fix Version/s: 4.0.0
       Resolution: Fixed

Committed to master. Thanks for the review [~pvary]

> Committed rows in hflush'd ACID files may be missing from query result
> ----------------------------------------------------------------------
>
>                 Key: HIVE-24266
>                 URL: https://issues.apache.org/jira/browse/HIVE-24266
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Ádám Szita
>            Assignee: Ádám Szita
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> in HDFS environment if a writer is using hflush to write ORC ACID files 
> during a transaction commit, the results might be seen as missing when 
> reading the table before this file is completely persisted to disk (thus 
> synced)
> This is due to hflush not persisting the new buffers to disk, it rather just 
> ensures that new readers can see the new content. This causes the block 
> information to be incomplete, on which BISplitStrategy relies on. Although 
> the side file (_flush_length) tracks the proper end of the file that is being 
> written, this information is neglected in the favour of block information, 
> and we may end up generating a very short split instead of the larger, 
> available length.
> When ETLSplitStrategy is used there is not even a try to rely on ACID side 
> file when calculating file length, so that needs to fixed too.
> Moreover we might see the newly committed rows not to appear due to OrcTail 
> caching in ETLSplitStrategy. For now I'm just going to recommend turning that 
> cache off to anyone that wants real time row updates to be read in:
> {code:java}
> set hive.orc.cache.stripe.details.mem.size=0;  {code}
> ..as tweaking with that code would probably open a can of worms..



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-24266) Committed rows in hflush'd ACID files may be missing from query result

Reply via email to