Peter Vary created HIVE-23764:
---------------------------------

             Summary: Remove unnecessary getLastFlushLength when checking 
delete delta files
                 Key: HIVE-23764
                 URL: https://issues.apache.org/jira/browse/HIVE-23764
             Project: Hive
          Issue Type: Improvement
          Components: Transactions
            Reporter: Peter Vary
            Assignee: Peter Vary


VectorizedOrcAcidRowBatchReader$ColumnizedDeleteEventRegistry calls 
OrcAcidUtils.getLastFlushLength for every delete delta file.
Even the comment says:
{code}
              // NOTE: Calling last flush length below is more for 
future-proofing when we have
              // streaming deletes. But currently we don't support streaming 
deletes, and this can
              // be removed if this becomes a performance issue.
{code}

If we have a table with 5 updates (1 base + 5 delta + 5 delete_delta), then for 
every base + delta dir we will check all of the delete_delta directories, and 
check the getLastFlushLength method which will result in 6*5=30 unnecessary 
NN/S3 calls.

We should remove the check as already proposed in the comment.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to