Micah Whitacre created CRUNCH-506: ------------------------------------- Summary: Default To.textFile to use TextFileSourceTarget Key: CRUNCH-506 URL: https://issues.apache.org/jira/browse/CRUNCH-506 Project: Crunch Issue Type: Improvement Components: Core Affects Versions: 0.11.0 Reporter: Micah Whitacre Assignee: Micah Whitacre
Had a consumer with an interesting situation. They had code like the following: {code} PCollection<String> output = ... output.write(To.textFile(path)); pipeline.done(); long size = output.length().getValue(); {code} This code was actually failing with an exception like the following: {noformat} Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.JavaMain], main() threw exception, org.apache.crunch.CrunchRuntimeException: java.io.IOException: No files found to materialize at: /tmp/crunch-107739816/p8 org.apache.oozie.action.hadoop.JavaMainException: org.apache.crunch.CrunchRuntimeException: java.io.IOException: No files found to materialize at: /tmp/crunch-107739816/p8 at org.apache.oozie.action.hadoop.JavaMain.run(JavaMain.java:58) at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:39) {noformat} I believe this is because the To.textFile(...) uses just TextFileTarget. So the length() call is going back to the intermediate state that got cleaned up by the done() call. Switching the To.textFile(..) to TextFileSourceTarget instead actually lets the code succeed. Seems like we could switch the To.textFile(..) to use the SourceTarget impl to make this less surprising/confusing to consumers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)