[jira] [Commented] (CRUNCH-509) Crunch with Spark doesn't name all outputs

Gabriel Reid (JIRA) Fri, 08 May 2015 13:46:16 -0700

    [ 
https://issues.apache.org/jira/browse/CRUNCH-509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14535476#comment-14535476
 ]


Gabriel Reid commented on CRUNCH-509:
-------------------------------------

At first, looking at the AvroOutputFormats I was a bit confused how that could 
still work, but a bit more looking at it and it makes sense. I did find those 
plain-text Avro schemas that were available in the job configuration handy for 
debugging sometimes, but obviously they shouldn't be in there anymore if they 
don't need to be.

As for the Spark stuff, yeah, it's a bit hacky-looking, but I don't see any 
problem with it (or better option) as long as the outputs are being written out 
in a loop like that.

> Crunch with Spark doesn't name all outputs
> ------------------------------------------
>
>                 Key: CRUNCH-509
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-509
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.11.0
>            Reporter: Micah Whitacre
>            Assignee: Josh Wills
>             Fix For: 0.12.0
>
>         Attachments: CRUNCH-509.patch, CRUNCH-509b.patch
>
>
> Crunch currently does not "name" all outputs when running with a 
> SparkPipeline.  This becomes a problem as some Targets (based on CRUNCH-82) 
> have coded in checked to ensure that the name must be populated.  
> Specifically the implementation I'm running into issues with is the Kite 
> DatasetTarget[2].
> Need to read up a bit on context to see if it is a Crunch/Kite issue or where 
> it is easiest/correct to fix.  [~jwills] or [~tomwhite] feedback would be 
> welcome.
> [1] - 
> https://github.com/apache/crunch/blob/3ab0b078c47f23b3ba893fdfb05fd723f663d02b/crunch-spark/src/main/java/org/apache/crunch/impl/spark/SparkRuntime.java#L337
> [2] - 
> https://github.com/kite-sdk/kite/blob/e080f0237e7383a16fff8547ad43387ccf55c473/kite-data/kite-data-crunch/src/main/java/org/kitesdk/data/crunch/DatasetTarget.java#L178



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CRUNCH-509) Crunch with Spark doesn't name all outputs

Reply via email to