[ https://issues.apache.org/jira/browse/CRUNCH-660?focusedWorklogId=200837&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-200837 ]
ASF GitHub Bot logged work on CRUNCH-660: ----------------------------------------- Author: ASF GitHub Bot Created on: 19/Feb/19 19:42 Start Date: 19/Feb/19 19:42 Worklog Time Spent: 10m Work Description: noslowerdna commented on pull request #17: CRUNCH-660, CRUNCH-675: Use DistCp instead of FileUtils.copy when sou… URL: https://github.com/apache/crunch/pull/17 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 200837) Time Spent: 20m (was: 10m) > FileTargetImpl uses Distcp vs FileUtils.copy > -------------------------------------------- > > Key: CRUNCH-660 > URL: https://issues.apache.org/jira/browse/CRUNCH-660 > Project: Crunch > Issue Type: Improvement > Components: Core > Reporter: Micah Whitacre > Assignee: Josh Wills > Priority: Major > Fix For: 1.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > So for handling multiple runtimes I'm not sure there is a way to solve this > but documenting as a JIRA regardless. > If you are running in a multi-cluster environment where you might want to > read data from one cluster and then write the output on another cluster (e.g. > generating HFiles to be loaded into a separate HBase cluster), the > performance of moving files is noticeable. Specifically due to the fact that > the moving of the files happens in the launcher/driver process versus as part > of the node execution it seems.[1] > An efficient option would be to kick off a DistCp instead but that would tie > the target directly to a runtime which is not a great approach. > [1] - > https://github.com/apache/crunch/blob/5609b014378d3460a55ce25522f0c00659872807/crunch-core/src/main/java/org/apache/crunch/io/impl/FileTargetImpl.java#L157 -- This message was sent by Atlassian JIRA (v7.6.3#76005)