[ https://issues.apache.org/jira/browse/CRUNCH-481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14306041#comment-14306041 ]
Ryan Brush commented on CRUNCH-481: ----------------------------------- This may be more appropriate for the related CDK-756 bug, but has anyone gotten this to work with a Hadoop 2 or CDH setup? I get the stack trace below when doing so. Interestingly, when debugging against a Hadoop 1 build I don't see the CompositeOutputCommitter.setupJob being called at all, which avoids the creation of the redundant dataset. (This can be reproduced by running the crunch_write_multiple_datasets.patch on CDK-756 against this code with a Hadoop 2 profile.) 15/02/04 15:30:21 INFO jobcontrol.CrunchControlledJob: Job status available at: http://localhost:8080/ 15/02/04 15:30:21 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.crunch.io.CrunchOutputs$CompositeOutputCommitter 15/02/04 15:30:21 WARN mapred.LocalJobRunner: job_local1926953006_0002 org.kitesdk.data.DatasetExistsException: Descriptor directory already exists: file:/var/folders/wy/tcxd96vx4vb_8m98zsjrn1vnlv9lnn/T/1423085421414-0/ns/.temp/job_local1926953006_0002/mr/job_local1926953006_0002/.metadata at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.create(FileSystemMetadataProvider.java:192) at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.create(FileSystemDatasetRepository.java:136) at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.createJobDataset(DatasetKeyOutputFormat.java:537) at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.access$200(DatasetKeyOutputFormat.java:64) at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat$MergeOutputCommitter.setupJob(DatasetKeyOutputFormat.java:358) at org.apache.crunch.io.CrunchOutputs$CompositeOutputCommitter.setupJob(CrunchOutputs.java:302) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:371) > Support independent output committers for multiple outputs > ---------------------------------------------------------- > > Key: CRUNCH-481 > URL: https://issues.apache.org/jira/browse/CRUNCH-481 > Project: Crunch > Issue Type: Bug > Components: Core > Reporter: Aniket Kulkarni > Assignee: Josh Wills > Priority: Minor > Fix For: 0.12.0 > > Attachments: CRUNCH-481.patch, CRUNCH-481.patch, CRUNCH-481.patch, > CRUNCH-481c.patch > > > I faced this issue while trying to write to Kite and HDFS in the same > pipeline. A similar issue was logged for Kite[1][2]. > I was attempting to write a PCollection to Kite and a different PTable to > HDFS as a text file. The write to Kite succeeded, however the write to HDFS > only produced a _SUCCESS file with no text file. > Commenting out the write to Kite seems to solve the issue and I can see the > text file being written. > [1] - https://issues.cloudera.org/browse/CDK-756 > [2] - > http://mail-archives.apache.org/mod_mbox/crunch-dev/201401.mbox/%3ccaf-wd4qcue0toh3qewpdnnom3u786pvjlgh7t6go_abctpl...@mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332)