Peter Vary created HIVE-23764:

             Summary: Remove unnecessary getLastFlushLength when checking 
delete delta files
                 Key: HIVE-23764
             Project: Hive
          Issue Type: Improvement
          Components: Transactions
            Reporter: Peter Vary
            Assignee: Peter Vary

VectorizedOrcAcidRowBatchReader$ColumnizedDeleteEventRegistry calls 
OrcAcidUtils.getLastFlushLength for every delete delta file.
Even the comment says:
              // NOTE: Calling last flush length below is more for 
future-proofing when we have
              // streaming deletes. But currently we don't support streaming 
deletes, and this can
              // be removed if this becomes a performance issue.

If we have a table with 5 updates (1 base + 5 delta + 5 delete_delta), then for 
every base + delta dir we will check all of the delete_delta directories, and 
check the getLastFlushLength method which will result in 6*5=30 unnecessary 
NN/S3 calls.

We should remove the check as already proposed in the comment.

This message was sent by Atlassian Jira

Reply via email to