pig-user  

Re: Dealing with empty data bags

Chris Olston
Thu, 05 Jun 2008 18:06:59 -0700

Probably the best fix is to redefine GROUP ALL so that in all cases it outputs a table with exactly one record. In the case of an empty input table it would produce an output record containing an empty bag. Is that what you have in mind, Olga?

-Chris


On Jun 5, 2008, at 4:05 PM, Olga Natkovich wrote:

I agree with you about the group. Could you, please, open JIRA about it.
I don't think there is a workaround for this issue.

Pig does have a limitted support for maps. None of the existing
expressions/operators create a map. The only way to get a map is to have them in your input data or for your UDF to generate them. If you do have
a map, you can retrive individual values as followis:

A = load 'data' as (map);
B = foreach A generate map#'key1', map#'key2' ...

where key1 and key2 are keys in the map.

Olga

-----Original Message-----
From: [EMAIL PROTECTED]
[EMAIL PROTECTED] On Behalf Of Prashanth Pappu
Sent: Thursday, June 05, 2008 3:31 PM
To: pig-user@incubator.apache.org
Subject: Dealing with empty data bags

(a) I see that at a lot of places where PIG doesn't correctly
deal with results that are empty bags.

Here's an example - Counting Tuples. Let's say I want to
count number of tuples in 'b' ( a subset of 'a'). I can do
the following -

a = load 'xyz' as (x,y,z);
b =  filter a by x==X;
c = group b all;
d = foreach c generate COUNT(b);

Ideally, we want d to be (0) if b has no tuples and non-zero
otherwise.
Unfortuantely, if b is empty, c is also empty! This is buggy
because it causes d to be empty or null and not (0).

Whereas, if b is empty, c should ideally be, c = (all, {}).
Which will make d = (0).

(b) Is there a different way of computing the number of
tuples in b that will always (irrespective of whether b is
empty or not) give the correct answer?

(c) I also see that PIG supports data maps. But I haven't
seen any examples that illustrate how to create or manipulate
data maps. Is there any such documentation?

thanks,
Prashanth


--
Christopher Olston, Ph.D.
Sr. Research Scientist
Yahoo! Research