Micah Whitacre created CRUNCH-506:
-------------------------------------

             Summary: Default To.textFile to use TextFileSourceTarget
                 Key: CRUNCH-506
                 URL: https://issues.apache.org/jira/browse/CRUNCH-506
             Project: Crunch
          Issue Type: Improvement
          Components: Core
    Affects Versions: 0.11.0
            Reporter: Micah Whitacre
            Assignee: Micah Whitacre


Had a consumer with an interesting situation.  They had code like the following:

{code}
PCollection<String> output = ...
output.write(To.textFile(path));
pipeline.done();

long size = output.length().getValue();
{code}

This code was actually failing with an exception like the following:

{noformat}
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.JavaMain], 
main() threw exception, org.apache.crunch.CrunchRuntimeException: 
java.io.IOException: No files found to materialize at: /tmp/crunch-107739816/p8
  org.apache.oozie.action.hadoop.JavaMainException: 
org.apache.crunch.CrunchRuntimeException: java.io.IOException: No files found 
to materialize at: /tmp/crunch-107739816/p8
  at org.apache.oozie.action.hadoop.JavaMain.run(JavaMain.java:58)
  at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:39)
{noformat}

I believe this is because the To.textFile(...) uses just TextFileTarget.  So 
the length() call is going back to the intermediate state that got cleaned up 
by the done() call.  Switching the To.textFile(..) to TextFileSourceTarget 
instead actually lets the code succeed.  

Seems like we could switch the To.textFile(..) to use the SourceTarget impl to 
make this less surprising/confusing to consumers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to