On Fri, Oct 1, 2010 at 14:00, Rob Wilkerson <rwilker...@lotame.com> wrote:

> On Fri, Oct 1, 2010 at 7:44 AM, David Vrensk <da...@icehouse.se> wrote:
> > I would just preprocess the file with Perl or Ruby:
> >
> > perl -ne 'next unless m#/#; s#(.*)/(.*)#\1\t\2#; print;' infile > outfile
>
> What is the "#" representing? I have a semi-educated guess, but I
> can't find that particular symbol in any examples.
>

They are alternative regexp delimiters.  Usually, we write regexen delimited
by slashes, but I wanted to avoid that since the only important character in
the regexp is a slash.


> Also, as far as I can tell, this regex also misses the top level path
> because it has not children. For example, the "Arts" path. It catches
> "Arts/Anime" and below nicely, of course.
>

Yup.  But the point is to count children, so if there is no child on the
row, there is nothing to count.

BTW, you didn't say if you wanted to count children or descendants (i.e.
children and children's children).  From your follow-up, I gather it's about
descendants.  Try this:

--------8<----------------
#! /usr/bin/env ruby

counts = Hash.new(0)

while (line = ARGF.gets)
  line.chomp!
  segments = line.split '/'
  key = []
  segments.each do |s|
    key << s
    counts[key.join('/')] += 1
  end
end

counts.keys.sort.each do |key|
  puts("%5d %s" % [counts[key], key])
end
--------8<----------------

which with your data returns

   23 Arts
   22 Arts/Animation
   21 Arts/Animation/Anime
    1 Arts/Animation/Anime/Characters
    1 Arts/Animation/Anime/Clubs_and_Organizations
    9 Arts/Animation/Anime/Collectibles
    1 Arts/Animation/Anime/Collectibles/Cels
    6 Arts/Animation/Anime/Collectibles/Models_and_Figures
    3 Arts/Animation/Anime/Collectibles/Models_and_Figures/Action_Figures
    1
Arts/Animation/Anime/Collectibles/Models_and_Figures/Action_Figures/Gundam
    1
Arts/Animation/Anime/Collectibles/Models_and_Figures/Action_Figures/Zoids
    2 Arts/Animation/Anime/Collectibles/Models_and_Figures/Models
    1 Arts/Animation/Anime/Collectibles/Models_and_Figures/Models/Gundam
    1 Arts/Animation/Anime/Collectibles/Shitajiki
    7 Arts/Animation/Anime/Creators
    1 Arts/Animation/Anime/Creators/Anno,_Hideaki
    1 Arts/Animation/Anime/Creators/Ikuhara,_Kunihiko
    1 Arts/Animation/Anime/Creators/Miyazaki,_Hayao
    3 Arts/Animation/Anime/Creators/Studios
    2 Arts/Animation/Anime/Creators/Studios/Studio_Ghibli
    1 Arts/Animation/Anime/Creators/Studios/Studio_Ghibli/Titles
    2 Arts/Animation/Anime/Distribution
    1 Arts/Animation/Anime/Distribution/Companies


HTH,

/David
-- 
David Vrensk
Systems developer, ICE House AB
Mobile: +46 703 74 69 00

Reply via email to