Micah Whitacre created CRUNCH-506:
-------------------------------------
Summary: Default To.textFile to use TextFileSourceTarget
Key: CRUNCH-506
URL: https://issues.apache.org/jira/browse/CRUNCH-506
Project: Crunch
Issue Type: Improvement
Components: Core
Affects Versions: 0.11.0
Reporter: Micah Whitacre
Assignee: Micah Whitacre
Had a consumer with an interesting situation. They had code like the following:
{code}
PCollection<String> output = ...
output.write(To.textFile(path));
pipeline.done();
long size = output.length().getValue();
{code}
This code was actually failing with an exception like the following:
{noformat}
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.JavaMain],
main() threw exception, org.apache.crunch.CrunchRuntimeException:
java.io.IOException: No files found to materialize at: /tmp/crunch-107739816/p8
org.apache.oozie.action.hadoop.JavaMainException:
org.apache.crunch.CrunchRuntimeException: java.io.IOException: No files found
to materialize at: /tmp/crunch-107739816/p8
at org.apache.oozie.action.hadoop.JavaMain.run(JavaMain.java:58)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:39)
{noformat}
I believe this is because the To.textFile(...) uses just TextFileTarget. So
the length() call is going back to the intermediate state that got cleaned up
by the done() call. Switching the To.textFile(..) to TextFileSourceTarget
instead actually lets the code succeed.
Seems like we could switch the To.textFile(..) to use the SourceTarget impl to
make this less surprising/confusing to consumers.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)