[jira] [Commented] (CRUNCH-509) Crunch with Spark doesn't name all outputs

Josh Wills (JIRA) Wed, 08 Apr 2015 08:07:41 -0700

    [ 
https://issues.apache.org/jira/browse/CRUNCH-509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14485358#comment-14485358
 ]


Josh Wills commented on CRUNCH-509:
-----------------------------------

[~tomwhite] the outputs in Crunch-on-Spark are not named, as Crunch-on-Spark 
doesn't use multiple outputs. Spark only writes one RDD at a time, so I hadn't 
bothered to name the outputs yet.

[~mkwhitacre] I _think_ that just renaming all of the outputs to be named 
"out0" should work in conjunction w/the CrunchOutputFormat, but I don't have 
time to test it out today. I can take it out for a spin tomorrow and see if I 
can get it working. It would be nice to start laying a foundation for multiple 
outputs in Spark.

> Crunch with Spark doesn't name all outputs
> ------------------------------------------
>
>                 Key: CRUNCH-509
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-509
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.11.0
>            Reporter: Micah Whitacre
>            Assignee: Josh Wills
>             Fix For: 0.12.0
>
>
> Crunch currently does not "name" all outputs when running with a 
> SparkPipeline.  This becomes a problem as some Targets (based on CRUNCH-82) 
> have coded in checked to ensure that the name must be populated.  
> Specifically the implementation I'm running into issues with is the Kite 
> DatasetTarget[2].
> Need to read up a bit on context to see if it is a Crunch/Kite issue or where 
> it is easiest/correct to fix.  [~jwills] or [~tomwhite] feedback would be 
> welcome.
> [1] - 
> https://github.com/apache/crunch/blob/3ab0b078c47f23b3ba893fdfb05fd723f663d02b/crunch-spark/src/main/java/org/apache/crunch/impl/spark/SparkRuntime.java#L337
> [2] - 
> https://github.com/kite-sdk/kite/blob/e080f0237e7383a16fff8547ad43387ccf55c473/kite-data/kite-data-crunch/src/main/java/org/kitesdk/data/crunch/DatasetTarget.java#L178



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CRUNCH-509) Crunch with Spark doesn't name all outputs

Reply via email to