[
https://issues.apache.org/jira/browse/HIVE-26150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528326#comment-17528326
]
Alessandro Solimando edited comment on HIVE-26150 at 4/26/22 5:34 PM:
----------------------------------------------------------------------
You are right, only _SortMergedDeleteEventRegistry_ uses _OrcRawRecordMerger_
(when memory is tight).
I found some tests covering this:
*[TestVectorizedOrcAcidRowBatchReader.java#L976|https://github.com/apache/hive/blob/a29810ce97a726fc70aecb53ebd648c3237106c4/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestVectorizedOrcAcidRowBatchReader.java#L976]
*[TestVectorizedOrcAcidRowBatchReader.java#L1113|https://github.com/apache/hive/blob/a29810ce97a726fc70aecb53ebd648c3237106c4/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestVectorizedOrcAcidRowBatchReader.java#L1113]
The issue does not reproduce there. I noticed a difference w.r.t. the failing
tests reported in the JIRA description, that is, deletes are interleaved with
updates in the failing case, while we have insert followed by a bunch of
deletes in the tests mentioned right above.
I will try to modify the tests to add some updates in between deletes and see
if I can reproduce that way.
was (Author: asolimando):
You are right, only _SortMergedDeleteEventRegistry_ uses _OrcRawRecordMerger_
when memory is tight.
I found some tests covering this:
*[TestVectorizedOrcAcidRowBatchReader.java#L976|https://github.com/apache/hive/blob/a29810ce97a726fc70aecb53ebd648c3237106c4/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestVectorizedOrcAcidRowBatchReader.java#L976]
*[TestVectorizedOrcAcidRowBatchReader.java#L1113|https://github.com/apache/hive/blob/a29810ce97a726fc70aecb53ebd648c3237106c4/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestVectorizedOrcAcidRowBatchReader.java#L1113]
The issue does not reproduce there. I noticed a difference w.r.t. the failing
tests reported in the JIRA description, that is, deletes are interleaved with
updates in the failing case, while we have insert followed by a bunch of
deletes in the tests mentioned right above.
I will try to modify the tests to add some updates in between deletes and see
if I can reproduce that way.
> OrcRawRecordMerger reads each row twice
> ---------------------------------------
>
> Key: HIVE-26150
> URL: https://issues.apache.org/jira/browse/HIVE-26150
> Project: Hive
> Issue Type: Bug
> Components: ORC, Transactions
> Affects Versions: 4.0.0-alpha-2
> Reporter: Alessandro Solimando
> Priority: Major
>
> OrcRawRecordMerger reads each row twice, the issue does not surface since the
> merger is only used with the parameter "collapseEvents" as true, which
> filters out one of the two rows.
> collapseEvents true and false should produce the same result, since in
> current acid implementation, each event has a distinct rowid, so two
> identical rows cannot be there, this is the case only for the bug.
> In order to reproduce the issue, it is sufficient to set the second parameter
> to false
> [here|https://github.com/apache/hive/blob/61d4ff2be48b20df9fd24692c372ee9c2606babe/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java#L2103-L2106],
> and run tests in TestOrcRawRecordMerger and observe two tests failing:
> {code:bash}
> mvn test -Dtest=TestOrcRawRecordMerger -pl ql
> {code}
> {noformat}
> [INFO] Results:
> [INFO]
> [ERROR] Failures:
> [ERROR] TestOrcRawRecordMerger.testRecordReaderNewBaseAndDelta:1332 Found
> unexpected row: (0,ignore.1)
> [ERROR] TestOrcRawRecordMerger.testRecordReaderOldBaseAndDelta:1208 Found
> unexpected row: (0,ignore.1)
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.7#820007)