Mehakmeet Singh created HADOOP-18596:
----------------------------------------
Summary: Distcp -update between different cloud stores to use
modification time while checking for file skip.
Key: HADOOP-18596
URL: https://issues.apache.org/jira/browse/HADOOP-18596
Project: Hadoop Common
Issue Type: Improvement
Components: tools/distcp
Reporter: Mehakmeet Singh
Assignee: Mehakmeet Singh
Distcp -update currently relies on File size, block size, and Checksum
comparisons to figure out which files should be skipped or copied.
Since different cloud stores have different checksum algorithms we should check
for modification time as well to the checks.
This would ensure that while performing -update if the files are perceived to
be out of sync we should copy them. The machines between which the file
transfers occur should be in time sync to avoid any extra copies.
Improving testing and documentation for modification time checks between
different object stores to ensure no incorrect skipping of files.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]