On Fri, Jul 1, 2011 at 1:43 PM, Gwern Branwen <gwe...@gmail.com> wrote: > Athas on #haskell wondered how many dependencies the average Haskell > package had. I commented that it seemed like some fairly simple > scripting to find out, and as these things tend to go, I wound up > doing a complete solution myself. > > First, we get most/all of Hackage locally to examine, as tarballs: > > for package in `cabal list | grep '\*' | tr -d '\*'`; do cabal > fetch $package; done
I think the index tarball has all the info you need, and would be faster to retrieve / process, if you or anyone else needs to get the .cabal files again: http://hackage.haskell.org/packages/archive/00-index.tar.gz (2.2mb) The set of the latest package sdists is also available: http://hackage.haskell.org/cgi-bin/hackage-scripts/archive.tar (~150mb) --Rogan > Then we cd .cabal/packages/hackage.haskell.org > > Now we can run a command which extracts the .cabal file from each > tarball to standard output: > > find . -name "*.tar.gz" -exec tar --wildcards "*.cabal" -Oxf {} \; > > We could grep for 'build-depends' or something, but that gives > unreliable dirty results. (>80k items, resulting in a hard to believe > 87k total deps and an average of 27 deps.) So instead, we use the > Cabal library and write a program to parse Cabal files & spit out the > dependencies, and we feed each .cabal into that: > > find . -name "*.tar.gz" -exec sh -c 'tar --wildcards "*.cabal" > -Oxf {} | runhaskell ~/deps.hs' \; > > And what is deps.hs? Turns out to be surprisingly easy to parse a > String, extract the Library and Executable AST, and grab the > [Dependency] field, and then print it out (code is not particularly > clean): > > import Distribution.Package > import Distribution.PackageDescription > import Distribution.PackageDescription.Parse > main :: IO () > main = do cbl <- getContents > let desc = parsePackageDescription cbl > case desc of > ParseFailed _ -> return () > ParseOk _ d -> putStr $ unlines $ map show $ map > (\(Dependency x _) -> x) $ extractDeps d > extractDeps :: GenericPackageDescription -> [Dependency] > extractDeps d = ldeps ++ edeps > where ldeps = case (condLibrary d) of > Nothing -> [] > Just c -> condTreeConstraints c > edeps = concat $ map (condTreeConstraints . snd) $ condExecutables d > > So what are the results? (The output of one run is attached.) I get > 18,134 dependencies, having run on 3,137 files, or 5.8 dependencies > per package. > > -- > gwern > http://www.gwern.net > > _______________________________________________ > Haskell-Cafe mailing list > Haskell-Cafe@haskell.org > http://www.haskell.org/mailman/listinfo/haskell-cafe > > _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe