So, if you had to do this for a "really large" file, and wanted to do this "the proper way" you would write the same thing as a Pig UDF that returns a bag of children, and then flatten the bag to get a row per partial hierarchy. Then group by said partial hierarchy, and count.
In pig 8 you will be able to write this udf in Python, if you like. -D On Fri, Oct 1, 2010 at 5:43 AM, Rob Wilkerson <rwilker...@lotame.com> wrote: > On Fri, Oct 1, 2010 at 8:23 AM, David Vrensk <da...@icehouse.se> wrote: > > Yup. But the point is to count children, so if there is no child on the > > row, there is nothing to count. > > I suppose the point is to count descendents + the base, but I should > have been more clear. > > > BTW, you didn't say if you wanted to count children or descendants (i.e. > > children and children's children). From your follow-up, I gather it's > about > > descendants > > This looks very much like what I'm after. I'll verify against the > data, of course, but thank you very much. > > > -- > +rw > > The information transmitted in this > email is intended only for the > person(s) or entity to which it is > addressed and may contain > confidential and/or privileged > material. Any review, > retransmission, dissemination > or other use of, or taking of any > action in reliance upon, this > information by persons or entities > other than the intended recipient > is prohibited. If you received this > email in error, please contact the > sender and permanently delete the > email from any computer. > >