A more concise way of stating this would be to say that:

For each line item, I need to count all of the line items (including
this on) that _start_ with the same value plus an optional "/".

For example, using pseudo-regex syntax:

- A count of all of the lines that start with "Arts$" or "Arts/"
- A count of all of the lines that start with "Arts/Animation$" or
"Arts/Animation/", etc.

I should have led with that, but I'm still feeling out the problem a little bit.

On Fri, Oct 1, 2010 at 7:32 AM, Rob Wilkerson <rwilker...@lotame.com> wrote:
> I have a script that loads a list of ~800,000 category hierarchies,
> filters them a bit and streams them through a PHP script for some
> quick procedural work. The file contains one column and a snippet
> looks like this:
>
> Arts
> Arts/Animation
> Arts/Animation/Anime
> Arts/Animation/Anime/Characters
> Arts/Animation/Anime/Clubs_and_Organizations
> Arts/Animation/Anime/Collectibles
> Arts/Animation/Anime/Collectibles/Cels
> Arts/Animation/Anime/Collectibles/Models_and_Figures
> Arts/Animation/Anime/Collectibles/Models_and_Figures/Action_Figures
> Arts/Animation/Anime/Collectibles/Models_and_Figures/Action_Figures/Gundam
> Arts/Animation/Anime/Collectibles/Models_and_Figures/Action_Figures/Zoids
> Arts/Animation/Anime/Collectibles/Models_and_Figures/Models
> Arts/Animation/Anime/Collectibles/Models_and_Figures/Models/Gundam
> Arts/Animation/Anime/Collectibles/Shitajiki
> Arts/Animation/Anime/Creators
> Arts/Animation/Anime/Creators/Anno,_Hideaki
> Arts/Animation/Anime/Creators/Ikuhara,_Kunihiko
> Arts/Animation/Anime/Creators/Miyazaki,_Hayao
> Arts/Animation/Anime/Creators/Studios
> Arts/Animation/Anime/Creators/Studios/Studio_Ghibli
> Arts/Animation/Anime/Creators/Studios/Studio_Ghibli/Titles
> Arts/Animation/Anime/Distribution
> Arts/Animation/Anime/Distribution/Companies
>
> Now I need to take it one step further. I need to get a count of how
> many items are in "Arts", how many are in "Arts/Animation", etc. I
> know a grouping and count is involved, but I can't wrap my mind around
> how to get there since the category path depth is entirely variable
> and I need these numbers relative to the "whole" (i.e. I need to know
> how many times Arts/Animation/Anime appears rather than how many times
> Anime appears at any level).
>
> Any guidance would be much appreciated.
 
The information transmitted in this  
email is intended only for the  
person(s) or entity to which it is  
addressed and may contain  
confidential and/or privileged  
material. Any review,  
retransmission, dissemination  
or other use of, or taking of any  
action in reliance upon, this  
information by persons or entities  
other than the intended recipient  
is prohibited. If you received this  
email in error, please contact the  
sender and permanently delete the  
email from any computer.  

Reply via email to