[
https://issues.apache.org/jira/browse/HIVE-20730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Saurabh Seth updated HIVE-20730:
--------------------------------
Attachment: HIVE-20730.patch
Status: Patch Available (was: Open)
I have tweaked {{VectorizedOrcAcidRowBatchReader.findMinMaxKeys}} to set a SARG
into delete_delta based on the stripe stats in case the {{hive.acid.key.index}}
is not present.
[~ekoifman], I couldn't add a unit test for this because I don't completely
understand how the query based compactor will generate such a file
(OrcRecordUpdater seems to always write the index). I tested this change by
ignoring the index present in files written using OrcRecordUpdater. If you have
any suggestions, please let me know.
> Do delete event filtering even if hive.acid.index is not there
> --------------------------------------------------------------
>
> Key: HIVE-20730
> URL: https://issues.apache.org/jira/browse/HIVE-20730
> Project: Hive
> Issue Type: Improvement
> Components: Transactions
> Affects Versions: 4.0.0
> Reporter: Eugene Koifman
> Assignee: Saurabh Seth
> Priority: Major
> Attachments: HIVE-20730.patch
>
>
> since HIVE-16812 {{VectorizedOrcAcidRowBatchReader}} filters delete events
> based on min/max ROW__ID in the split which relies on {{hive.acid.index}} to
> be in the ORC footer.
> There is no way to generate {{hive.acid.index}} from a plain query as in
> HIVE-20699 and so we need to make sure that we generate a SARG into
> delete_delta/bucket_x based on stripe stats even the index is missing
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)