pig-user  

RE: Union support

Olga Natkovich
Tue, 30 Sep 2008 09:07:39 -0700

Can you try the code in types barnch and see if the problem goes away?

Olga 

> -----Original Message-----
> From: Arthur Zwiegincew [EMAIL PROTECTED] 
> Sent: Monday, September 29, 2008 7:05 PM
> To: pig-user@incubator.apache.org
> Subject: Re: Union support
> 
> Oh, I didn't realize cat was a Pig command... The result is 
> still the same.
> 
> Is this a regression? If so, what's the right build to use?
> 
> Thanks,
> Arthur
> 
> On Mon, Sep 29, 2008 at 6:59 PM, Mridul Muralidharan
> <[EMAIL PROTECTED]>wrote:
> 
> >
> > I suggested store instead of dump to see if the problem is 
> related to 
> > dump only or whether it is a general issue.
> >
> > cat in pig works the same whether it is a file or a directory 
> > (appropriate files in dir ofcourse).
> >
> > Though looking at your ls output, I suspect the map output did not 
> > produce the required result ...
> >
> > - Mridul
> >
> >
> > Arthur Zwiegincew wrote:
> >
> >> I created two scripts: the first one the same as before, but using 
> >> STORE instead of DUMP, and the second one, which loads the stored 
> >> file and dumps it. No difference-I just get {(4, 20), (5, 20)}.
> >>
> >> Also, your suggestion of using cat indicates that you might be 
> >> thinking about local mode (where union works). As I see it, 
> >> PigStorage() in Hadoop mode ends up creating a directory, 
> not just a single file:
> >>
> >> $ dir union.out
> >> total 8
> >> -rw-r--r--  1 arthur  staff    10B Sep 29 18:40 map-map
> >> -rw-r--r--  1 arthur  staff     0B Sep 29 18:40 part-00000
> >>
> >> -Arthur
> >>
> >> On Mon, Sep 29, 2008 at 6:36 PM, Mridul Muralidharan
> >> <[EMAIL PROTECTED]>wrote:
> >>
> >>  Hi Arthur,
> >>>
> >>>  Does a store instead of dump work ?
> >>> Something like :
> >>>
> >>> -- start --
> >>> data = load '/Users/arthur/tmp/data' as (x, y);
> >>> data2 = load '/Users/arthur/tmp/data-2' as (x, y); both = union 
> >>> data, data2; store both into 'temp' using PigStorage();
> >>>
> >>> cat temp
> >>> -- end --
> >>>
> >>>
> >>>
> >>> Regards,
> >>> Mridul
> >>>
> >>> Arthur Zwiegincew wrote:
> >>>
> >>>  I've come across a very basic problem-unions simply do 
> not work in
> >>>> Hadoop
> >>>> mode.
> >>>>
> >>>> data files:
> >>>>
> >>>> $ cat ~/tmp/data
> >>>> 1 1
> >>>> 2 1
> >>>> 3 10
> >>>>
> >>>> $ cat ~/tmp/data-2
> >>>> 4 20
> >>>> 5 20
> >>>>
> >>>> pig script:
> >>>> data = load '/Users/arthur/tmp/data' as (x, y);
> >>>> data2 = load '/Users/arthur/tmp/data-2' as (x, y); both = union 
> >>>> data, data2; dump both;
> >>>>
> >>>> result:
> >>>> (4, 20)
> >>>> (5, 20)
> >>>>
> >>>>
> >>>> I've opened a bug 
> <https://issues.apache.org/jira/browse/PIG-390> 
> >>>> on this, but there has been no response.
> >>>>
> >>>>
> >>>> Am I missing anything?
> >>>>
> >>>>
> >>>> Thanks,
> >>>> Arthur
> >>>>
> >>>>
> >>>>
> >>
> >
>