No, I don't think that gets me all the way to what I want, although it
might be a reasonable approach to one *part* of it. The main idea I'm after
is efficient representations of things like HashMap, where nodes in a tree
have variable numbers of children. First, let's look at how HashMap is
actually defined (with inconsequential simplifications & clarifications):

data HashMap k v
    = Empty
    | BitmapIndexed !Word !(SmallArray (HashMap k v))
    | Leaf !Word !(Leaf k v)
    | Collision !Word !(SmallArray (Leaf k v))

The internal nodes are represented by BitmapIndexed constructors. The
biggest goal is to flatten those BitmapIndexed values out, so let's start
with just that. Conceptually, imagine we have

data HashMap# :: Type -> Type -> TYPE 'UnliftedRep -- hopefully soon
('BoxedRep 'Unlifted)
pattern BitmapIndexed# :: Word# -> SmallArray# (HashMap k v) -> HashMap# k v

I want a value produced by `BitmapIndexed#` to be a single heap object with:

1. Enough information to tell that it was produced by `BitmapIndexed#`.
2. The included `Word#` (representing the hash).
3. Pointers to all the children.

One way to think of this is as unpacking arrays into (unlifted)
user-defined types. Something like

unlifted data HashMap#
  = Empty#
  | BitmapIndexed# !Word {-# UnpackArray #-} {-# Unpack #-} !(SmallArray
(HashMap k v))
  | ...

Where these fake pragmas indicate a sort of double unpacking, where the
array size and elements
are unpacked into the `BitmapIndexed#` constructor.

Now let's get into the leaves. We may well want to unpack the leaves into
their parents. This gets
a little messy. If we have a totally wild bitmap approach, this is pretty
straightforward, because we
can indicate which elements are pointers (to child nodes, keys, or values)
and which elements are
unboxed (hashes), and do some fancy arithmetic to figure out where each
thing is. Is there a
cleaner, more general approach? I think that's going to be very hard. From
a "pure" standpoint,
we could imagine some sort of succinct data structure that lets us locate
the nth element and figure
out which alternative of the sum it is. But then we'd need some *very*
principled operations for
constructing arrays; the usual unrestricted mutation approach is not going
to work at all.

On Thu, Oct 8, 2020, 3:12 PM Ben Gamari <b...@smart-cactus.org> wrote:

> David Feuer <david.fe...@gmail.com> writes:
>
> > Ah, too bad about reuse. What do you mean about walking over both
> pointers
> > and non-pointers? The extra word (for pointers-first layout) or few words
> > (for bitmapped) will be more than made up for in most cases by not
> needing
> > extra heap objects between a node and its children.
> >
> Let's consider the case that you want an array of (String, Int#) pairs.
> Today you have a choice of two representations:
>
>  * structure-of-arrays:
>
>        type Array = (Array# String, ByteArray#)
>
>    where the ByteArray# contains packed `Int#`s.
>
>  * array-of-structures:
>
>        data Entry = Entry String !Int#
>
>        type Array = Array# Entry
>
> In the latter case you indeed incur an extra indirection, but not in the
> former. It is true that locality may be in the former (especially
> if your entry is a wide product rather than the pair used here) but on
> the whole it's often a good choice.
>
> I think you are asking for a very general MixedArray#, which would allow
> you to specify dynamically whether each word of the array is a pointer
> or not. This seems like a lot of power and may complicate garbage
> collection. Apologies if I have misunderstood.
>
> I think you can get most of the power of this idea with a much simpler
> idea: rather than allow configuration of each word dynamically, specify
> the array as a bunch of packed unboxed tuples. For instance, in Haskell
> this might look like:
>
>      -- This class would need to be magic since the compiler would need
>      -- to generate a info table describing the entry layout for each
>      -- instance.
>      class HasArrayRep (a :: k)
>
>      data PackedArray# (a :: k)
>
>      newPackedArray# :: HasArrayRep a
>                      => Int#
>                      -> State# s -> (# State# s, PackedArray# a)
>      readPackedArray# :: HasArrayRep a
>                       => Int -> PackedArray# a ->
>                       -> State# s -> (# State# s, a)
>      -- ... etc.
>
> Implementation would involve the introduction of a new array closure
> type and suitable scavenging logic. Each HasArrayRep instance would emit
> an info table; the layout section of this info table would contain a
> bitmap (of the same sort used for stack frames) which would describe the
> layout of a single array element. To avoid unaligned accesses you would
> want to round up the element size to the nearest word size, potentially
> incurring some slop (e.g. each element of a `PackedArray# (# Word#,
> Word8# #)` would require two words). However, sub-word-size fields could
> in principle be packed into a single word (finally allowing us to get
> some mileage out of the sub-word-size primitive types we now have).
>
> Perhaps this helps in your use-case?
>
> Cheers,
>
> - Ben
>
_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Reply via email to