So, if you had to do this for a "really large" file, and wanted to do this
"the proper way" you would write the same thing as a Pig UDF that returns a
bag of children, and then flatten the bag to get a row per partial
hierarchy. Then group by said partial hierarchy, and count.

In pig 8 you will be able to write this udf in Python, if you like.

-D

On Fri, Oct 1, 2010 at 5:43 AM, Rob Wilkerson <rwilker...@lotame.com> wrote:

> On Fri, Oct 1, 2010 at 8:23 AM, David Vrensk <da...@icehouse.se> wrote:
> > Yup.  But the point is to count children, so if there is no child on the
> > row, there is nothing to count.
>
> I suppose the point is to count descendents + the base, but I should
> have been more clear.
>
> > BTW, you didn't say if you wanted to count children or descendants (i.e.
> > children and children's children).  From your follow-up, I gather it's
> about
> > descendants
>
> This looks very much like what I'm after. I'll verify against the
> data, of course, but thank you very much.
>
>
> --
> +rw
>
> The information transmitted in this
> email is intended only for the
> person(s) or entity to which it is
> addressed and may contain
> confidential and/or privileged
> material. Any review,
> retransmission, dissemination
> or other use of, or taking of any
> action in reliance upon, this
> information by persons or entities
> other than the intended recipient
> is prohibited. If you received this
> email in error, please contact the
> sender and permanently delete the
> email from any computer.
>
>

Reply via email to