If your data is going though a reducer, there's support for something like this built in to Crunch, although it's not (yet) very developer-friendly.
If you have a custom Partitioner that maps each key to a pre-determined partition id, you can implement a custom FileNamingScheme[1] and have then map the output partition keys to a set filename that represents the content under that key. I believe most (or all) Target implementations can be instantiated with a FileNamingScheme object. - Gabriel [1] http://crunch.apache.org/apidocs/0.7.0/org/apache/crunch/io/FileNamingScheme.html On Wed, Aug 7, 2013 at 3:04 PM, Micah Whitacre <[email protected]> wrote: > I believe you could accomplish this but creating PCollections for each of > the key/values you want to persist and then writing[1] the PCollections out > to whichever directories makes the most sense. > > [1] - > > http://crunch.apache.org/apidocs/0.7.0/org/apache/crunch/Pipeline.html#write(org.apache.crunch.PCollection > , > org.apache.crunch.Target) > > > On Wed, Aug 7, 2013 at 3:31 AM, Mridul Das <[email protected]> wrote: > > > Hi, > > MultipleOutputs enable us to generate custom file names base on > > keys/values. > > How do we achieve this in Crunch? > > > > Regards, > > Mridul > > >
