[
https://issues.apache.org/jira/browse/HIVE-25790?focusedWorklogId=806943&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-806943
]
ASF GitHub Bot logged work on HIVE-25790:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 08/Sep/22 09:27
Start Date: 08/Sep/22 09:27
Worklog Time Spent: 10m
Work Description: pudidic opened a new pull request, #3582:
URL: https://github.com/apache/hive/pull/3582
### What changes were proposed in this pull request?
Changed FileUtils.copy() to skip identical files on the destination
directory to improve copy performance. FileUtils.copy() originally just removed
and recreated the destination directory. This change makes it compare each file
and directory, and delete only different files and directories.
### Why are the changes needed?
In an optimized replication bootstrap scenario, it copies many files from
source to destination. It can copy thousands of files. If it fails during
copying process, it retries. Then it has some files already copied, but its
implementation removes them and copy all of them entirely. It should skip the
already copied ones.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
It introduced few JUnit test scenarios in TestFileUtils and TestCopyUtils.
It will be tested with automated regression test suites on the test server.
Issue Time Tracking
-------------------
Worklog Id: (was: 806943)
Remaining Estimate: 0h
Time Spent: 10m
> Make managed table copies handle updates (FileUtils)
> ----------------------------------------------------
>
> Key: HIVE-25790
> URL: https://issues.apache.org/jira/browse/HIVE-25790
> Project: Hive
> Issue Type: Improvement
> Reporter: Haymant Mangla
> Assignee: Teddy Choi
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)