Chris Olston
Thu, 05 Jun 2008 16:09:12 -0700
The question is whether one can construct a Pig program that gives the semantics you want. Unfortunately off the top of my head the answer seems to be 'no'. If that's the case we need to look at what needs to be added/changed in the language to enable testing for empty outermost tables. (If I'm overlooking something I'm sure one of my colleagues will chime in :)
-Chris On Jun 5, 2008, at 3:31 PM, Prashanth Pappu wrote:
(a) I see that at a lot of places where PIG doesn't correctly deal withresults that are empty bags.Here's an example - Counting Tuples. Let's say I want to count number oftuples in 'b' ( a subset of 'a'). I can do the following - a = load 'xyz' as (x,y,z); b = filter a by x==X; c = group b all; d = foreach c generate COUNT(b);Ideally, we want d to be (0) if b has no tuples and non-zero otherwise. Unfortuantely, if b is empty, c is also empty! This is buggy because itcauses d to be empty or null and not (0).Whereas, if b is empty, c should ideally be, c = (all, {}). Which will maked = (0).(b) Is there a different way of computing the number of tuples in b that will always (irrespective of whether b is empty or not) give the correctanswer?(c) I also see that PIG supports data maps. But I haven't seen any examples that illustrate how to create or manipulate data maps. Is there any suchdocumentation? thanks, Prashanth
-- Christopher Olston, Ph.D. Sr. Research Scientist Yahoo! Research