A more concise way of stating this would be to say that: For each line item, I need to count all of the line items (including this on) that _start_ with the same value plus an optional "/".
For example, using pseudo-regex syntax: - A count of all of the lines that start with "Arts$" or "Arts/" - A count of all of the lines that start with "Arts/Animation$" or "Arts/Animation/", etc. I should have led with that, but I'm still feeling out the problem a little bit. On Fri, Oct 1, 2010 at 7:32 AM, Rob Wilkerson <rwilker...@lotame.com> wrote: > I have a script that loads a list of ~800,000 category hierarchies, > filters them a bit and streams them through a PHP script for some > quick procedural work. The file contains one column and a snippet > looks like this: > > Arts > Arts/Animation > Arts/Animation/Anime > Arts/Animation/Anime/Characters > Arts/Animation/Anime/Clubs_and_Organizations > Arts/Animation/Anime/Collectibles > Arts/Animation/Anime/Collectibles/Cels > Arts/Animation/Anime/Collectibles/Models_and_Figures > Arts/Animation/Anime/Collectibles/Models_and_Figures/Action_Figures > Arts/Animation/Anime/Collectibles/Models_and_Figures/Action_Figures/Gundam > Arts/Animation/Anime/Collectibles/Models_and_Figures/Action_Figures/Zoids > Arts/Animation/Anime/Collectibles/Models_and_Figures/Models > Arts/Animation/Anime/Collectibles/Models_and_Figures/Models/Gundam > Arts/Animation/Anime/Collectibles/Shitajiki > Arts/Animation/Anime/Creators > Arts/Animation/Anime/Creators/Anno,_Hideaki > Arts/Animation/Anime/Creators/Ikuhara,_Kunihiko > Arts/Animation/Anime/Creators/Miyazaki,_Hayao > Arts/Animation/Anime/Creators/Studios > Arts/Animation/Anime/Creators/Studios/Studio_Ghibli > Arts/Animation/Anime/Creators/Studios/Studio_Ghibli/Titles > Arts/Animation/Anime/Distribution > Arts/Animation/Anime/Distribution/Companies > > Now I need to take it one step further. I need to get a count of how > many items are in "Arts", how many are in "Arts/Animation", etc. I > know a grouping and count is involved, but I can't wrap my mind around > how to get there since the category path depth is entirely variable > and I need these numbers relative to the "whole" (i.e. I need to know > how many times Arts/Animation/Anime appears rather than how many times > Anime appears at any level). > > Any guidance would be much appreciated. The information transmitted in this email is intended only for the person(s) or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this email in error, please contact the sender and permanently delete the email from any computer.