Chris Olston
Thu, 05 Jun 2008 18:06:59 -0700
-Chris On Jun 5, 2008, at 4:05 PM, Olga Natkovich wrote:
I agree with you about the group. Could you, please, open JIRA about it.I don't think there is a workaround for this issue. Pig does have a limitted support for maps. None of the existingexpressions/operators create a map. The only way to get a map is to have them in your input data or for your UDF to generate them. If you do havea map, you can retrive individual values as followis: A = load 'data' as (map); B = foreach A generate map#'key1', map#'key2' ... where key1 and key2 are keys in the map. Olga-----Original Message----- From: [EMAIL PROTECTED] [EMAIL PROTECTED] On Behalf Of Prashanth Pappu Sent: Thursday, June 05, 2008 3:31 PM To: pig-user@incubator.apache.org Subject: Dealing with empty data bags (a) I see that at a lot of places where PIG doesn't correctly deal with results that are empty bags. Here's an example - Counting Tuples. Let's say I want to count number of tuples in 'b' ( a subset of 'a'). I can do the following - a = load 'xyz' as (x,y,z); b = filter a by x==X; c = group b all; d = foreach c generate COUNT(b); Ideally, we want d to be (0) if b has no tuples and non-zero otherwise. Unfortuantely, if b is empty, c is also empty! This is buggy because it causes d to be empty or null and not (0). Whereas, if b is empty, c should ideally be, c = (all, {}). Which will make d = (0). (b) Is there a different way of computing the number of tuples in b that will always (irrespective of whether b is empty or not) give the correct answer? (c) I also see that PIG supports data maps. But I haven't seen any examples that illustrate how to create or manipulate data maps. Is there any such documentation? thanks, Prashanth
-- Christopher Olston, Ph.D. Sr. Research Scientist Yahoo! Research