Olga Natkovich
Thu, 05 Jun 2008 16:07:08 -0700
I agree with you about the group. Could you, please, open JIRA about it.
I don't think there is a workaround for this issue.
Pig does have a limitted support for maps. None of the existing
expressions/operators create a map. The only way to get a map is to have
them in your input data or for your UDF to generate them. If you do have
a map, you can retrive individual values as followis:
A = load 'data' as (map);
B = foreach A generate map#'key1', map#'key2' ...
where key1 and key2 are keys in the map.
Olga
> -----Original Message-----
> From: [EMAIL PROTECTED]
> [EMAIL PROTECTED] On Behalf Of Prashanth Pappu
> Sent: Thursday, June 05, 2008 3:31 PM
> To: pig-user@incubator.apache.org
> Subject: Dealing with empty data bags
>
> (a) I see that at a lot of places where PIG doesn't correctly
> deal with results that are empty bags.
>
> Here's an example - Counting Tuples. Let's say I want to
> count number of tuples in 'b' ( a subset of 'a'). I can do
> the following -
>
> a = load 'xyz' as (x,y,z);
> b = filter a by x==X;
> c = group b all;
> d = foreach c generate COUNT(b);
>
> Ideally, we want d to be (0) if b has no tuples and non-zero
> otherwise.
> Unfortuantely, if b is empty, c is also empty! This is buggy
> because it causes d to be empty or null and not (0).
>
> Whereas, if b is empty, c should ideally be, c = (all, {}).
> Which will make d = (0).
>
> (b) Is there a different way of computing the number of
> tuples in b that will always (irrespective of whether b is
> empty or not) give the correct answer?
>
> (c) I also see that PIG supports data maps. But I haven't
> seen any examples that illustrate how to create or manipulate
> data maps. Is there any such documentation?
>
> thanks,
> Prashanth
>