[ https://issues.apache.org/jira/browse/CRUNCH-509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14535476#comment-14535476 ]
Gabriel Reid commented on CRUNCH-509: ------------------------------------- At first, looking at the AvroOutputFormats I was a bit confused how that could still work, but a bit more looking at it and it makes sense. I did find those plain-text Avro schemas that were available in the job configuration handy for debugging sometimes, but obviously they shouldn't be in there anymore if they don't need to be. As for the Spark stuff, yeah, it's a bit hacky-looking, but I don't see any problem with it (or better option) as long as the outputs are being written out in a loop like that. > Crunch with Spark doesn't name all outputs > ------------------------------------------ > > Key: CRUNCH-509 > URL: https://issues.apache.org/jira/browse/CRUNCH-509 > Project: Crunch > Issue Type: Bug > Components: Core > Affects Versions: 0.11.0 > Reporter: Micah Whitacre > Assignee: Josh Wills > Fix For: 0.12.0 > > Attachments: CRUNCH-509.patch, CRUNCH-509b.patch > > > Crunch currently does not "name" all outputs when running with a > SparkPipeline. This becomes a problem as some Targets (based on CRUNCH-82) > have coded in checked to ensure that the name must be populated. > Specifically the implementation I'm running into issues with is the Kite > DatasetTarget[2]. > Need to read up a bit on context to see if it is a Crunch/Kite issue or where > it is easiest/correct to fix. [~jwills] or [~tomwhite] feedback would be > welcome. > [1] - > https://github.com/apache/crunch/blob/3ab0b078c47f23b3ba893fdfb05fd723f663d02b/crunch-spark/src/main/java/org/apache/crunch/impl/spark/SparkRuntime.java#L337 > [2] - > https://github.com/kite-sdk/kite/blob/e080f0237e7383a16fff8547ad43387ccf55c473/kite-data/kite-data-crunch/src/main/java/org/kitesdk/data/crunch/DatasetTarget.java#L178 -- This message was sent by Atlassian JIRA (v6.3.4#6332)