[
https://issues.apache.org/jira/browse/HIVE-23764?focusedWorklogId=451310&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-451310
]
ASF GitHub Bot logged work on HIVE-23764:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 25/Jun/20 22:08
Start Date: 25/Jun/20 22:08
Worklog Time Spent: 10m
Work Description: pvary opened a new pull request #1185:
URL: https://github.com/apache/hive/pull/1185
## NOTICE
Please create an issue in ASF JIRA before opening a pull request,
and you need to set the title of the pull request which starts with
the corresponding JIRA issue number. (e.g. HIVE-XXXXX: Fix a typo in YYY)
For more details, please see
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 451310)
Remaining Estimate: 0h
Time Spent: 10m
> Remove unnecessary getLastFlushLength when checking delete delta files
> ----------------------------------------------------------------------
>
> Key: HIVE-23764
> URL: https://issues.apache.org/jira/browse/HIVE-23764
> Project: Hive
> Issue Type: Improvement
> Components: Transactions
> Reporter: Peter Vary
> Assignee: Peter Vary
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
> VectorizedOrcAcidRowBatchReader$ColumnizedDeleteEventRegistry calls
> OrcAcidUtils.getLastFlushLength for every delete delta file.
> Even the comment says:
> {code}
> // NOTE: Calling last flush length below is more for
> future-proofing when we have
> // streaming deletes. But currently we don't support streaming
> deletes, and this can
> // be removed if this becomes a performance issue.
> {code}
> If we have a table with 5 updates (1 base + 5 delta + 5 delete_delta), then
> for every base + delta dir we will check all of the delete_delta directories,
> and check the getLastFlushLength method which will result in 6*5=30
> unnecessary NN/S3 calls.
> We should remove the check as already proposed in the comment.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)