Prashanth Pappu
Thu, 05 Jun 2008 15:31:33 -0700
(a) I see that at a lot of places where PIG doesn't correctly deal with
results that are empty bags.
Here's an example - Counting Tuples. Let's say I want to count number of
tuples in 'b' ( a subset of 'a'). I can do the following -
a = load 'xyz' as (x,y,z);
b = filter a by x==X;
c = group b all;
d = foreach c generate COUNT(b);
Ideally, we want d to be (0) if b has no tuples and non-zero otherwise.
Unfortuantely, if b is empty, c is also empty! This is buggy because it
causes d to be empty or null and not (0).
Whereas, if b is empty, c should ideally be, c = (all, {}). Which will make
d = (0).
(b) Is there a different way of computing the number of tuples in b that
will always (irrespective of whether b is empty or not) give the correct
answer?
(c) I also see that PIG supports data maps. But I haven't seen any examples
that illustrate how to create or manipulate data maps. Is there any such
documentation?
thanks,
Prashanth