[
https://issues.apache.org/jira/browse/CRUNCH-306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845026#comment-13845026
]
Micah Whitacre commented on CRUNCH-306:
---------------------------------------
I think my/Bryan's use case is slightly different than Jeremy's in that we
don't expect the files to be named "key.avro" but instead were thinking
/<basePath>/<some key derived path>/part-*-*.avro This would eliminate the
thread contention if a key existed in multiple partitions.
Jeremy would that work for you? Since the AvroFileSource would support reading
from a directory you could still consume it in a similar fashion without it
being a single file.
Looking at the AvroFilePerKeyTarget/AvroFilePerKeyOutputFormat should we also
document the hint that sorting by keys would be helpful as well to have
improved performance (less opening and closing of files). I'd most will be
doing a GBK to ensure a single partition and then would get this naturally as
part of the ungroup() but this wouldn't be the case if they are doing it in the
map only.
> MultipleOutput Targets
> ----------------------
>
> Key: CRUNCH-306
> URL: https://issues.apache.org/jira/browse/CRUNCH-306
> Project: Crunch
> Issue Type: New Feature
> Components: IO
> Reporter: Josh Wills
> Attachments: CRUNCH-306.patch, CRUNCH-306b.patch
>
>
> A commonly desired feature for Crunch is the ability to write an output file
> for each key in a PTable/PGroupedTable containing the values associated with
> that key. We should find a way to support that one-output-per-key model.
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)