Thanks Dmitriy, this is exactly what I need.

There was one bug I ran into though FYI, which is when making a request like
this, as documented in the JavaDocs:

STORE A INTO '/my/home/output' USING MultiStorage('/my/home/output','0',
'none', '\t');

Pig would create a file '/my/home/output' and then an exception would be
thrown when MultiStorage tried to make a directory under '/my/home/output'.
The workaround that worked for me was to instead specify a dummy location as
the first path like so:

STORE A INTO '/my/home/output/temp' USING
MultiStorage('/my/home/output','0', 'none', '\t');


On Tue, Dec 15, 2009 at 1:06 PM, Dmitriy Ryaboy <dvrya...@gmail.com> wrote:

> Bill,
> A custom storefunc should do the trick. See
> https://issues.apache.org/jira/browse/PIG-958  (aka
> piggybank.storage.MultiStorage) for a jumping-off point.
>
> -D
>
> On Tue, Dec 15, 2009 at 1:59 PM, Bill Graham <billgra...@gmail.com> wrote:
> > Hi,
> >
> > I'm pretty sure the answer to my question is no, but I have to ask. Is it
> > possible within Pig to store different groups of data into different
> output
> > files where the grouping is dynamic (i.e. not known ahead of time)?
> Here's
> > what I'm trying to do...
> >
> > I've got a script that reads log files of URLs and generates counts for a
> > given time period. The urls might have a 'tag' querystring param though,
> and
> > in that case I want to get the most popular urls for each tag output to
> it's
> > own file.
> >
> > My data looks like this and is ordered by tag asc, count desc:
> >
> > [tag] [timeinterval] [url] [count]
> >
> > I need to do something like so:
> >
> > for each tag group found
> >  store all records in file foo_[tag].txt
> >
> > I ultimately need these files on local disk and I'm looking for a better
> way
> > to do so than generating a file of N unique tags in HDFS, reading it from
> > Java, submitting N jobs with the tag name substituted into a script file,
> > followed by N copyToLocal calls.
> >
> > At least two possible solutions come to mind, but am curious if there's
> > another that I'm overlooking:
> > 1. In java submit pig dynamic commands to an instance of PigServer. I'd
> > still need a unique tag file for this case.
> > 2. Maybe with a custom store function??
> >
> > thanks,
> > Bill
> >
>

Reply via email to