RE: Using associated data types to create unpacked data structures

Simon Peyton-Jones Thu, 12 Aug 2010 09:08:50 -0700

You don't want to go overboard here.

1.       You *want* a distinct blob of lookup code for each different key type, 
because you really do want a different lookup structure for each


2.       For the most part you *dont want* a different blob of lookup code for 
each value type, because almost all of them are represented uniformly by a 
pointer.

So it's be silly to generate all possible combinations of  types.  The 
exception to (2) is that  you want different code for a handful of types that 
you want to unbox into the tree structure itself: Int#, Float# etc.  It would 
be good to design a convenient way to do that.  It's nothing directly to do 
with associated types.  We'd like to allow Maybe Int#, say, but we don't at the 
moment because that data structure would really be represented differently.  
Some kind of data type and code cloning (a la C++) is probably the right thing. 
 This is what Max meant by "just engineering" but it would require careful 
thought and design.

Simon

From: [email protected] 
[mailto:[email protected]] On Behalf Of Johan Tibell
Sent: 12 August 2010 16:56
To: Simon Marlow
Cc: glasgow-haskell-users
Subject: Re: Using associated data types to create unpacked data structures

On Thu, Aug 12, 2010 at 11:28 AM, Simon Marlow 
<[email protected]<mailto:[email protected]>> wrote:
Rather than try to solve this problem in one go, I would go for a low-tech 
approach for now: write a TH library to generate the code, and ask the user to 
declare the versions they need.  To make a particular version, the user would 
say something like

 module MapIntDouble (module MapIntDouble) where
 import TibbeMagicMapGenerator
 make_me_a_map ...

there's no type class of course, so you can't write functions that work over 
all specialised Maps.  But this at least lets you generate optimised maps for 
only a little boilerplate, and get the performance boost you were after.

To get a better idea of how many specialized maps the user would have to create 
under this scheme I ran an analysis of the Chromium codebase [1]. The Chromium 
codebase is not small but some companies have codebases which are several order 
of magnitudes larger, which makes the results below more of a lower bound than 
an upper bound on the number of specialized maps one might need in a program.

    $ git clone http://src.chromium.org/git/chromium.git
    Initialized empty Git repository in /tmp/chromium/.git/
    remote: Counting objects: 548595, done.
    remote: Compressing objects: 100% (167063/167063), done.
    remote: Total 548595 (delta 401993), reused 477011 (delta 343049)
    Receiving objects: 100% (548595/548595), 1.02 GiB | 24.44 MiB/s, done.
    Resolving deltas: 100% (401993/401993), done.
    $ cd chromium
    $ find . -name \*.h -o -name \*.cc -exec egrep -o "map<[^,]+, ?[^>]+>" {} 
\; | sort -u | wc -l
    220
    $ find . -name \*.h -o -name \*.cc -exec w -l {} \; | awk '{tot=tot+$1} END 
{print tot}'
    81328

So in a code base of about 80 KLOC there are 220 unique key/value combinations. 
While the numbers might not translate exactly to Haskell it still indicates 
that the number of modules a user would have to create (and put somewhere in 
the source tree) would be quite large.

1. http://src.chromium.org/git/chromium.git

Cheers,
Johan

_______________________________________________
Glasgow-haskell-users mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

RE: Using associated data types to create unpacked data structures

Reply via email to