pig-user  

RE: Dealing with empty data bags

Olga Natkovich
Thu, 05 Jun 2008 16:07:08 -0700

I agree with you about the group. Could you, please, open JIRA about it.
I don't think there is a workaround for this issue.

Pig does have a limitted support for maps. None of the existing
expressions/operators create a map. The only way to get a map is to have
them in your input data or for your UDF to generate them. If you do have
a map, you can retrive individual values as followis:

A = load 'data' as (map);
B = foreach A generate map#'key1', map#'key2' ...

where key1 and key2 are keys in the map.

Olga

> -----Original Message-----
> From: [EMAIL PROTECTED] 
> [EMAIL PROTECTED] On Behalf Of Prashanth Pappu
> Sent: Thursday, June 05, 2008 3:31 PM
> To: pig-user@incubator.apache.org
> Subject: Dealing with empty data bags
> 
> (a) I see that at a lot of places where PIG doesn't correctly 
> deal with results that are empty bags.
> 
> Here's an example - Counting Tuples. Let's say I want to 
> count number of tuples in 'b' ( a subset of 'a'). I can do 
> the following -
> 
> a = load 'xyz' as (x,y,z);
> b =  filter a by x==X;
> c = group b all;
> d = foreach c generate COUNT(b);
> 
> Ideally, we want d to be (0) if b has no tuples and non-zero 
> otherwise.
> Unfortuantely, if b is empty, c is also empty! This is buggy 
> because it causes d to be empty or null and not (0).
> 
> Whereas, if b is empty, c should ideally be, c = (all, {}). 
> Which will make d = (0).
> 
> (b) Is there a different way of computing the number of 
> tuples in b that will always (irrespective of whether b is 
> empty or not) give the correct answer?
> 
> (c) I also see that PIG supports data maps. But I haven't 
> seen any examples that illustrate how to create or manipulate 
> data maps. Is there any such documentation?
> 
> thanks,
> Prashanth
>