[ 
https://issues.apache.org/jira/browse/HIVE-19838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16508925#comment-16508925
 ] 

Eugene Koifman commented on HIVE-19838:
---------------------------------------

I think one of the ways {{totalDeleteEventCount}} in 
{{ColumnizedDeleteEventRegistry}} may be off, is that {{DeleteReaderValue}} 
takes a ValidWriteIdList which means that {{next()}} may skip some event 
because it belongs to a transaction that was not yet committed when the current 
reader locked in the snapshot.
In practice, this would require compaction (at least a minor one) which 
includes a txn that is open to the reader's txn, to complete before the 
VectorizedOrc reader starts reading - which is possible but not very likely.

Another issue, which I think is eliminated by the current patch is, 
{noformat}
        if (lastSeenOwid != deleteRecordKey.originalWriteId ||
          lastSeenBucketProperty != deleteRecordKey.bucketProperty) {
          ++distinctOwids;
          lastSeenOwid = deleteRecordKey.originalWriteId;
          lastSeenBucketProperty = deleteRecordKey.bucketProperty;
        }
{noformat}
{{distinctOwids}} is incremented when bucketProperty changes, which seems 
invalid even for bucketed tables.


> simplify & fix ColumnizedDeleteEventRegistry load loop
> ------------------------------------------------------
>
>                 Key: HIVE-19838
>                 URL: https://issues.apache.org/jira/browse/HIVE-19838
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>            Priority: Major
>         Attachments: HIVE-19838.01.patch, HIVE-19838.patch
>
>
> Apparently sometimes the delete count in ACID stats doesn't match what merger 
> actually returns.
> It could be due to some deltas having duplicate deletes from parallel queries 
> (I guess?) that are being squashed by the merger or some other reasons beyond 
> my mortal comprehension.
> The loop assumes the merger will return the exact number of records, so it 
> fails with array index exception. Also, it could actually be done in a single 
> loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to