Olga Natkovich
Tue, 30 Sep 2008 09:54:12 -0700
The top of the trunk is stable and runs with Hadoop 18. For every
checkin, we make sure that all unit tests pass as well as running a
number of end-to-end tests.
The types branch is also in a pretty stable state as the same rules
apply for checkin there as well. However, given that this is all a brand
new code I would say it is a bit more stable. However, it has new
features and much higher performance numbers. It is also compatible with
Hadoop 18.
The release we pushed is from the trunk prior to integrating with hadoop
18.
Olga
> -----Original Message-----
> From: [EMAIL PROTECTED]
> [EMAIL PROTECTED] On Behalf Of Prashanth Pappu
> Sent: Tuesday, September 30, 2008 9:42 AM
> To: pig-user@incubator.apache.org
> Subject: Re: Union support
>
> More importantly, can you please tell us the svn version of
> the build you are using?
>
> Some of us use PIG extensively for our applications and it
> would be nice if we can start doing some kind of release management.
> I.e., Though I want to upgrade PIG (to work with Hadoop 17/18
> etc.), using top of SVN seems a little risky.
> So, if we have some idea of which svn builds we consider to
> be fairly stable, we can upgrade a little conservatively.
>
> FYI, I've been using -
>
> For hadoop 16: svn 653894 (split command is buggy; rest seems ok)
>
> I wasn't sure which version was best for 1.0 release (is
> there a separate branch?), but
>
> For hadoop 17: svn 694017 (I haven't had a chance to fully test this
> version)
>
> Prashanth
> On Tue, Sep 30, 2008 at 9:03 AM, Olga Natkovich
> <[EMAIL PROTECTED]> wrote:
>
> > Can you try the code in types barnch and see if the problem
> goes away?
> >
> > Olga
> >
> > > -----Original Message-----
> > > From: Arthur Zwiegincew [EMAIL PROTECTED]
> > > Sent: Monday, September 29, 2008 7:05 PM
> > > To: pig-user@incubator.apache.org
> > > Subject: Re: Union support
> > >
> > > Oh, I didn't realize cat was a Pig command... The result is still
> > > the same.
> > >
> > > Is this a regression? If so, what's the right build to use?
> > >
> > > Thanks,
> > > Arthur
> > >
> > > On Mon, Sep 29, 2008 at 6:59 PM, Mridul Muralidharan
> > > <[EMAIL PROTECTED]>wrote:
> > >
> > > >
> > > > I suggested store instead of dump to see if the problem is
> > > related to
> > > > dump only or whether it is a general issue.
> > > >
> > > > cat in pig works the same whether it is a file or a directory
> > > > (appropriate files in dir ofcourse).
> > > >
> > > > Though looking at your ls output, I suspect the map
> output did not
> > > > produce the required result ...
> > > >
> > > > - Mridul
> > > >
> > > >
> > > > Arthur Zwiegincew wrote:
> > > >
> > > >> I created two scripts: the first one the same as before, but
> > > >> using STORE instead of DUMP, and the second one, which
> loads the
> > > >> stored file and dumps it. No difference-I just get
> {(4, 20), (5, 20)}.
> > > >>
> > > >> Also, your suggestion of using cat indicates that you might be
> > > >> thinking about local mode (where union works). As I see it,
> > > >> PigStorage() in Hadoop mode ends up creating a directory,
> > > not just a single file:
> > > >>
> > > >> $ dir union.out
> > > >> total 8
> > > >> -rw-r--r-- 1 arthur staff 10B Sep 29 18:40 map-map
> > > >> -rw-r--r-- 1 arthur staff 0B Sep 29 18:40 part-00000
> > > >>
> > > >> -Arthur
> > > >>
> > > >> On Mon, Sep 29, 2008 at 6:36 PM, Mridul Muralidharan
> > > >> <[EMAIL PROTECTED]>wrote:
> > > >>
> > > >> Hi Arthur,
> > > >>>
> > > >>> Does a store instead of dump work ?
> > > >>> Something like :
> > > >>>
> > > >>> -- start --
> > > >>> data = load '/Users/arthur/tmp/data' as (x, y);
> > > >>> data2 = load '/Users/arthur/tmp/data-2' as (x, y);
> both = union
> > > >>> data, data2; store both into 'temp' using PigStorage();
> > > >>>
> > > >>> cat temp
> > > >>> -- end --
> > > >>>
> > > >>>
> > > >>>
> > > >>> Regards,
> > > >>> Mridul
> > > >>>
> > > >>> Arthur Zwiegincew wrote:
> > > >>>
> > > >>> I've come across a very basic problem-unions simply do
> > > not work in
> > > >>>> Hadoop
> > > >>>> mode.
> > > >>>>
> > > >>>> data files:
> > > >>>>
> > > >>>> $ cat ~/tmp/data
> > > >>>> 1 1
> > > >>>> 2 1
> > > >>>> 3 10
> > > >>>>
> > > >>>> $ cat ~/tmp/data-2
> > > >>>> 4 20
> > > >>>> 5 20
> > > >>>>
> > > >>>> pig script:
> > > >>>> data = load '/Users/arthur/tmp/data' as (x, y);
> > > >>>> data2 = load '/Users/arthur/tmp/data-2' as (x, y);
> both = union
> > > >>>> data, data2; dump both;
> > > >>>>
> > > >>>> result:
> > > >>>> (4, 20)
> > > >>>> (5, 20)
> > > >>>>
> > > >>>>
> > > >>>> I've opened a bug
> > > <https://issues.apache.org/jira/browse/PIG-390>
> > > >>>> on this, but there has been no response.
> > > >>>>
> > > >>>>
> > > >>>> Am I missing anything?
> > > >>>>
> > > >>>>
> > > >>>> Thanks,
> > > >>>> Arthur
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>
> > > >
> > >
> >
>