Hung Tran created GOBBLIN-1057:
----------------------------------
Summary: Optimize unnecessary RPCs in distcp-ng
Key: GOBBLIN-1057
URL: https://issues.apache.org/jira/browse/GOBBLIN-1057
Project: Apache Gobblin
Issue Type: Improvement
Reporter: Hung Tran
There are some per-file FileSystem RPCs being invoked in Gobblin distcp-ng.
This results in a long file discovery phase that can be hours for a few
thousand files.
The RPCs that can be removed are:
getFileChecksum() - the value doesn't appear to be used.
getFileStatus() - this is called to get the modification time in
ModTimeDataFileVersionStrategy.getVersion(). The modification time is already
available from listStatus(), so use that value.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)