[
https://issues.apache.org/jira/browse/HIVE-19838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16508925#comment-16508925
]
Eugene Koifman commented on HIVE-19838:
---------------------------------------
I think one of the ways {{totalDeleteEventCount}} in
{{ColumnizedDeleteEventRegistry}} may be off, is that {{DeleteReaderValue}}
takes a ValidWriteIdList which means that {{next()}} may skip some event
because it belongs to a transaction that was not yet committed when the current
reader locked in the snapshot.
In practice, this would require compaction (at least a minor one) which
includes a txn that is open to the reader's txn, to complete before the
VectorizedOrc reader starts reading - which is possible but not very likely.
Another issue, which I think is eliminated by the current patch is,
{noformat}
if (lastSeenOwid != deleteRecordKey.originalWriteId ||
lastSeenBucketProperty != deleteRecordKey.bucketProperty) {
++distinctOwids;
lastSeenOwid = deleteRecordKey.originalWriteId;
lastSeenBucketProperty = deleteRecordKey.bucketProperty;
}
{noformat}
{{distinctOwids}} is incremented when bucketProperty changes, which seems
invalid even for bucketed tables.
> simplify & fix ColumnizedDeleteEventRegistry load loop
> ------------------------------------------------------
>
> Key: HIVE-19838
> URL: https://issues.apache.org/jira/browse/HIVE-19838
> Project: Hive
> Issue Type: Bug
> Reporter: Sergey Shelukhin
> Assignee: Sergey Shelukhin
> Priority: Major
> Attachments: HIVE-19838.01.patch, HIVE-19838.patch
>
>
> Apparently sometimes the delete count in ACID stats doesn't match what merger
> actually returns.
> It could be due to some deltas having duplicate deletes from parallel queries
> (I guess?) that are being squashed by the merger or some other reasons beyond
> my mortal comprehension.
> The loop assumes the merger will return the exact number of records, so it
> fails with array index exception. Also, it could actually be done in a single
> loop.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)