Thanks Dmitriy, this is exactly what I need. There was one bug I ran into though FYI, which is when making a request like this, as documented in the JavaDocs:
STORE A INTO '/my/home/output' USING MultiStorage('/my/home/output','0', 'none', '\t'); Pig would create a file '/my/home/output' and then an exception would be thrown when MultiStorage tried to make a directory under '/my/home/output'. The workaround that worked for me was to instead specify a dummy location as the first path like so: STORE A INTO '/my/home/output/temp' USING MultiStorage('/my/home/output','0', 'none', '\t'); On Tue, Dec 15, 2009 at 1:06 PM, Dmitriy Ryaboy <dvrya...@gmail.com> wrote: > Bill, > A custom storefunc should do the trick. See > https://issues.apache.org/jira/browse/PIG-958 (aka > piggybank.storage.MultiStorage) for a jumping-off point. > > -D > > On Tue, Dec 15, 2009 at 1:59 PM, Bill Graham <billgra...@gmail.com> wrote: > > Hi, > > > > I'm pretty sure the answer to my question is no, but I have to ask. Is it > > possible within Pig to store different groups of data into different > output > > files where the grouping is dynamic (i.e. not known ahead of time)? > Here's > > what I'm trying to do... > > > > I've got a script that reads log files of URLs and generates counts for a > > given time period. The urls might have a 'tag' querystring param though, > and > > in that case I want to get the most popular urls for each tag output to > it's > > own file. > > > > My data looks like this and is ordered by tag asc, count desc: > > > > [tag] [timeinterval] [url] [count] > > > > I need to do something like so: > > > > for each tag group found > > store all records in file foo_[tag].txt > > > > I ultimately need these files on local disk and I'm looking for a better > way > > to do so than generating a file of N unique tags in HDFS, reading it from > > Java, submitting N jobs with the tag name substituted into a script file, > > followed by N copyToLocal calls. > > > > At least two possible solutions come to mind, but am curious if there's > > another that I'm overlooking: > > 1. In java submit pig dynamic commands to an instance of PigServer. I'd > > still need a unique tag file for this case. > > 2. Maybe with a custom store function?? > > > > thanks, > > Bill > > >