[ https://issues.apache.org/jira/browse/CRUNCH-543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960973#comment-14960973 ]
Micah Whitacre commented on CRUNCH-543: --------------------------------------- Thanks for the patches [~aeckstein]. We'll probably need to work out if this is going to conflict a bit with @tomwhite patch for https://issues.apache.org/jira/browse/CRUNCH-562 Also I'm concerned about potentially keeping that many writers open per map task. The best practice for this target is to make sure all the instances of a single key are in the same partition[1] which helps to avoid having all of those instances open. [1] - https://github.com/apache/crunch/blob/57235348d6628d28c3a869f23aca15888aa377be/crunch-core/src/main/java/org/apache/crunch/io/avro/AvroPathPerKeyTarget.java#L47 > AvroPathPerKeyTarget copy nested subdirectories > ----------------------------------------------- > > Key: CRUNCH-543 > URL: https://issues.apache.org/jira/browse/CRUNCH-543 > Project: Crunch > Issue Type: Improvement > Components: IO > Reporter: Adric Eckstein > Assignee: Josh Wills > Fix For: 0.13.0 > > Attachments: CRUNCH-543.patch, CRUNCH-543b.patch, CRUNCH-543c.patch > > > When using AvroPathPerKeyTarget to write out a subpath in the output > directory using a String key, the key might indicate multiple subfolders: > Pair<String, String> kv = new Pair<String, String>("foo/bar", "value"); > PTable<String, String> kvs = > pipeline.create(Arrays.asList(kv),Avros.tableOf(Avros.strings(), > Avros.strings())); > PTables.asPTable(kvs).write(new AvroPathPerKeyTarget("output")); > This throws the error: > java.io.IOException: java.lang.IllegalArgumentException: Reducer output name > 'bar' cannot be parsed > at > org.apache.crunch.impl.mr.exec.CrunchJobHooks$CompletionHook.handleMultiPaths(CrunchJobHooks.java:92) > ... > In AvroPathPerKeyTarget the handleOutputs method would need to recursively > copy subfolders (currently only checks first level in output directory) to > enable keys that define multiple sub folders. -- This message was sent by Atlassian JIRA (v6.3.4#6332)