On Fri, Oct 1, 2010 at 14:00, Rob Wilkerson <rwilker...@lotame.com> wrote:
> On Fri, Oct 1, 2010 at 7:44 AM, David Vrensk <da...@icehouse.se> wrote: > > I would just preprocess the file with Perl or Ruby: > > > > perl -ne 'next unless m#/#; s#(.*)/(.*)#\1\t\2#; print;' infile > outfile > > What is the "#" representing? I have a semi-educated guess, but I > can't find that particular symbol in any examples. > They are alternative regexp delimiters. Usually, we write regexen delimited by slashes, but I wanted to avoid that since the only important character in the regexp is a slash. > Also, as far as I can tell, this regex also misses the top level path > because it has not children. For example, the "Arts" path. It catches > "Arts/Anime" and below nicely, of course. > Yup. But the point is to count children, so if there is no child on the row, there is nothing to count. BTW, you didn't say if you wanted to count children or descendants (i.e. children and children's children). From your follow-up, I gather it's about descendants. Try this: --------8<---------------- #! /usr/bin/env ruby counts = Hash.new(0) while (line = ARGF.gets) line.chomp! segments = line.split '/' key = [] segments.each do |s| key << s counts[key.join('/')] += 1 end end counts.keys.sort.each do |key| puts("%5d %s" % [counts[key], key]) end --------8<---------------- which with your data returns 23 Arts 22 Arts/Animation 21 Arts/Animation/Anime 1 Arts/Animation/Anime/Characters 1 Arts/Animation/Anime/Clubs_and_Organizations 9 Arts/Animation/Anime/Collectibles 1 Arts/Animation/Anime/Collectibles/Cels 6 Arts/Animation/Anime/Collectibles/Models_and_Figures 3 Arts/Animation/Anime/Collectibles/Models_and_Figures/Action_Figures 1 Arts/Animation/Anime/Collectibles/Models_and_Figures/Action_Figures/Gundam 1 Arts/Animation/Anime/Collectibles/Models_and_Figures/Action_Figures/Zoids 2 Arts/Animation/Anime/Collectibles/Models_and_Figures/Models 1 Arts/Animation/Anime/Collectibles/Models_and_Figures/Models/Gundam 1 Arts/Animation/Anime/Collectibles/Shitajiki 7 Arts/Animation/Anime/Creators 1 Arts/Animation/Anime/Creators/Anno,_Hideaki 1 Arts/Animation/Anime/Creators/Ikuhara,_Kunihiko 1 Arts/Animation/Anime/Creators/Miyazaki,_Hayao 3 Arts/Animation/Anime/Creators/Studios 2 Arts/Animation/Anime/Creators/Studios/Studio_Ghibli 1 Arts/Animation/Anime/Creators/Studios/Studio_Ghibli/Titles 2 Arts/Animation/Anime/Distribution 1 Arts/Animation/Anime/Distribution/Companies HTH, /David -- David Vrensk Systems developer, ICE House AB Mobile: +46 703 74 69 00