Some form of MutableStruct# with a known number of words and a known number of pointers is basically what Ryan Yates was suggesting above, but where the word counts were stored in the objects themselves.
Given that it'd have a couple of words for those counts it'd likely want to be something we build in addition to MutVar# rather than a replacement. On the other hand, if we had to fix those numbers and build info tables that knew them, and typechecker support, for instance, it'd get rather invasive. Also, a number of things that we can do with the 'sized' versions above, like working with evil unsized c-style arrays directly inline at the end of the structure cease to be possible, so it isn't even a pure win if we did the engineering effort. I think 90% of the needs I have are covered just by adding the one primitive. The last 10% gets pretty invasive. -Edward On Fri, Aug 28, 2015 at 5:30 PM, Ryan Newton <[email protected]> wrote: > I like the possibility of a general solution for mutable structs (like Ed > said), and I'm trying to fully understand why it's hard. > > So, we can't unpack MutVar into constructors because of object identity > problems. But what about directly supporting an extensible set of unlifted > MutStruct# objects, generalizing (and even replacing) MutVar#? That may be > too much work, but is it problematic otherwise? > > Needless to say, this is also critical if we ever want best in class > lockfree mutable structures, just like their Stm and sequential > counterparts. > > On Fri, Aug 28, 2015 at 4:43 AM Simon Peyton Jones <[email protected]> > wrote: > >> At the very least I'll take this email and turn it into a short article. >> >> Yes, please do make it into a wiki page on the GHC Trac, and maybe make a >> ticket for it. >> >> >> Thanks >> >> >> >> Simon >> >> >> >> *From:* Edward Kmett [mailto:[email protected]] >> *Sent:* 27 August 2015 16:54 >> *To:* Simon Peyton Jones >> *Cc:* Manuel M T Chakravarty; Simon Marlow; ghc-devs >> *Subject:* Re: ArrayArrays >> >> >> >> An ArrayArray# is just an Array# with a modified invariant. It points >> directly to other unlifted ArrayArray#'s or ByteArray#'s. >> >> >> >> While those live in #, they are garbage collected objects, so this all >> lives on the heap. >> >> >> >> They were added to make some of the DPH stuff fast when it has to deal >> with nested arrays. >> >> >> >> I'm currently abusing them as a placeholder for a better thing. >> >> >> >> The Problem >> >> ----------------- >> >> >> >> Consider the scenario where you write a classic doubly-linked list in >> Haskell. >> >> >> >> data DLL = DLL (IORef (Maybe DLL) (IORef (Maybe DLL) >> >> >> >> Chasing from one DLL to the next requires following 3 pointers on the >> heap. >> >> >> >> DLL ~> IORef (Maybe DLL) ~> MutVar# RealWorld (Maybe DLL) ~> Maybe DLL ~> >> DLL >> >> >> >> That is 3 levels of indirection. >> >> >> >> We can trim one by simply unpacking the IORef with -funbox-strict-fields >> or UNPACK >> >> >> >> We can trim another by adding a 'Nil' constructor for DLL and worsening >> our representation. >> >> >> >> data DLL = DLL !(IORef DLL) !(IORef DLL) | Nil >> >> >> >> but now we're still stuck with a level of indirection >> >> >> >> DLL ~> MutVar# RealWorld DLL ~> DLL >> >> >> >> This means that every operation we perform on this structure will be >> about half of the speed of an implementation in most other languages >> assuming we're memory bound on loading things into cache! >> >> >> >> Making Progress >> >> ---------------------- >> >> >> >> I have been working on a number of data structures where the indirection >> of going from something in * out to an object in # which contains the real >> pointer to my target and coming back effectively doubles my runtime. >> >> >> >> We go out to the MutVar# because we are allowed to put the MutVar# onto >> the mutable list when we dirty it. There is a well defined write-barrier. >> >> >> >> I could change out the representation to use >> >> >> >> data DLL = DLL (MutableArray# RealWorld DLL) | Nil >> >> >> >> I can just store two pointers in the MutableArray# every time, but this >> doesn't help _much_ directly. It has reduced the amount of distinct >> addresses in memory I touch on a walk of the DLL from 3 per object to 2. >> >> >> >> I still have to go out to the heap from my DLL and get to the array >> object and then chase it to the next DLL and chase that to the next array. >> I do get my two pointers together in memory though. I'm paying for a card >> marking table as well, which I don't particularly need with just two >> pointers, but we can shed that with the "SmallMutableArray#" machinery >> added back in 7.10, which is just the old array code a a new data type, >> which can speed things up a bit when you don't have very big arrays: >> >> >> >> data DLL = DLL (SmallMutableArray# RealWorld DLL) | Nil >> >> >> >> But what if I wanted my object itself to live in # and have two mutable >> fields and be able to share the sme write barrier? >> >> >> >> An ArrayArray# points directly to other unlifted array types. What if we >> have one # -> * wrapper on the outside to deal with the impedence mismatch >> between the imperative world and Haskell, and then just let the >> ArrayArray#'s hold other arrayarrays. >> >> >> >> data DLL = DLL (MutableArrayArray# RealWorld) >> >> >> >> now I need to make up a new Nil, which I can just make be a special >> MutableArrayArray# I allocate on program startup. I can even abuse pattern >> synonyms. Alternately I can exploit the internals further to make this >> cheaper. >> >> >> >> Then I can use the readMutableArrayArray# and writeMutableArrayArray# >> calls to directly access the preceding and next entry in the linked list. >> >> >> >> So now we have one DLL wrapper which just 'bootstraps me' into a strict >> world, and everything there lives in #. >> >> >> >> next :: DLL -> IO DLL >> >> next (DLL m) = IO $ \s -> case readMutableArrayArray# s of >> >> (# s', n #) -> (# s', DLL n #) >> >> >> >> It turns out GHC is quite happy to optimize all of that code to keep >> things unboxed. The 'DLL' wrappers get removed pretty easily when they are >> known strict and you chain operations of this sort! >> >> >> >> Cleaning it Up >> >> ------------------ >> >> >> >> Now I have one outermost indirection pointing to an array that points >> directly to other arrays. >> >> >> >> I'm stuck paying for a card marking table per object, but I can fix that >> by duplicating the code for MutableArrayArray# and using a >> SmallMutableArray#. I can hack up primops that let me store a mixture of >> SmallMutableArray# fields and normal ones in the data structure. >> Operationally, I can even do so by just unsafeCoercing the existing >> SmallMutableArray# primitives to change the kind of one of the arguments it >> takes. >> >> >> >> This is almost ideal, but not quite. I often have fields that would be >> best left unboxed. >> >> >> >> data DLLInt = DLL !Int !(IORef DLL) !(IORef DLL) | Nil >> >> >> >> was able to unpack the Int, but we lost that. We can currently at best >> point one of the entries of the SmallMutableArray# at a boxed or at a >> MutableByteArray# for all of our misc. data and shove the int in question >> in there. >> >> >> >> e.g. if I were to implement a hash-array-mapped-trie I need to store >> masks and administrivia as I walk down the tree. Having to go off to the >> side costs me the entire win from avoiding the first pointer chase. >> >> >> >> But, if like Ryan suggested, we had a heap object we could construct that >> had n words with unsafe access and m pointers to other heap objects, one >> that could put itself on the mutable list when any of those pointers >> changed then I could shed this last factor of two in all circumstances. >> >> >> >> Prototype >> >> ------------- >> >> >> >> Over the last few days I've put together a small prototype implementation >> with a few non-trivial imperative data structures for things like Tarjan's >> link-cut trees, the list labeling problem and order-maintenance. >> >> >> >> https://github.com/ekmett/structs >> >> >> >> Notable bits: >> >> >> >> Data.Struct.Internal.LinkCut >> <https://github.com/ekmett/structs/blob/9ff2818f888aff4789b7a41077a674a10d15e6ee/src/Data/Struct/Internal/LinkCut.hs> >> provides an implementation of link-cut trees in this style. >> >> >> >> Data.Struct.Internal >> <https://github.com/ekmett/structs/blob/9ff2818f888aff4789b7a41077a674a10d15e6ee/src/Data/Struct/Internal.hs> >> provides the rather horrifying guts that make it go fast. >> >> >> >> Once compiled with -O or -O2, if you look at the core, almost all the >> references to the LinkCut or Object data constructor get optimized away, >> and we're left with beautiful strict code directly mutating out underlying >> representation. >> >> >> >> At the very least I'll take this email and turn it into a short article. >> >> >> >> -Edward >> >> >> >> On Thu, Aug 27, 2015 at 9:00 AM, Simon Peyton Jones < >> [email protected]> wrote: >> >> Just to say that I have no idea what is going on in this thread. What is >> ArrayArray? What is the issue in general? Is there a ticket? Is there a >> wiki page? >> >> >> >> If it’s important, an ab-initio wiki page + ticket would be a good thing. >> >> >> >> Simon >> >> >> >> *From:* ghc-devs [mailto:[email protected]] *On Behalf Of *Edward >> Kmett >> *Sent:* 21 August 2015 05:25 >> *To:* Manuel M T Chakravarty >> *Cc:* Simon Marlow; ghc-devs >> *Subject:* Re: ArrayArrays >> >> >> >> When (ab)using them for this purpose, SmallArrayArray's would be very >> handy as well. >> >> >> >> Consider right now if I have something like an order-maintenance >> structure I have: >> >> >> >> data Upper s = Upper {-# UNPACK #-} !(MutableByteArray s) {-# UNPACK #-} >> !(MutVar s (Upper s)) {-# UNPACK #-} !(MutVar s (Upper s)) >> >> >> >> data Lower s = Lower {-# UNPACK #-} !(MutVar s (Upper s)) {-# UNPACK #-} >> !(MutableByteArray s) {-# UNPACK #-} !(MutVar s (Lower s)) {-# UNPACK #-} >> !(MutVar s (Lower s)) >> >> >> >> The former contains, logically, a mutable integer and two pointers, one >> for forward and one for backwards. The latter is basically the same thing >> with a mutable reference up pointing at the structure above. >> >> >> >> On the heap this is an object that points to a structure for the >> bytearray, and points to another structure for each mutvar which each point >> to the other 'Upper' structure. So there is a level of indirection smeared >> over everything. >> >> >> >> So this is a pair of doubly linked lists with an upward link from the >> structure below to the structure above. >> >> >> >> Converted into ArrayArray#s I'd get >> >> >> >> data Upper s = Upper (MutableArrayArray# s) >> >> >> >> w/ the first slot being a pointer to a MutableByteArray#, and the next 2 >> slots pointing to the previous and next previous objects, represented just >> as their MutableArrayArray#s. I can use sameMutableArrayArray# on these for >> object identity, which lets me check for the ends of the lists by tying >> things back on themselves. >> >> >> >> and below that >> >> >> >> data Lower s = Lower (MutableArrayArray# s) >> >> >> >> is similar, with an extra MutableArrayArray slot pointing up to an upper >> structure. >> >> >> >> I can then write a handful of combinators for getting out the slots in >> question, while it has gained a level of indirection between the wrapper to >> put it in * and the MutableArrayArray# s in #, that one can be basically >> erased by ghc. >> >> >> >> Unlike before I don't have several separate objects on the heap for each >> thing. I only have 2 now. The MutableArrayArray# for the object itself, and >> the MutableByteArray# that it references to carry around the mutable int. >> >> >> >> The only pain points are >> >> >> >> 1.) the aforementioned limitation that currently prevents me from >> stuffing normal boxed data through a SmallArray or Array into an ArrayArray >> leaving me in a little ghetto disconnected from the rest of Haskell, >> >> >> >> and >> >> >> >> 2.) the lack of SmallArrayArray's, which could let us avoid the card >> marking overhead. These objects are all small, 3-4 pointers wide. Card >> marking doesn't help. >> >> >> >> Alternately I could just try to do really evil things and convert the >> whole mess to SmallArrays and then figure out how to unsafeCoerce my way to >> glory, stuffing the #'d references to the other arrays directly into the >> SmallArray as slots, removing the limitation we see here by aping the >> MutableArrayArray# s API, but that gets really really dangerous! >> >> >> >> I'm pretty much willing to sacrifice almost anything on the altar of >> speed here, but I'd like to be able to let the GC move them and collect >> them which rules out simpler Ptr and Addr based solutions. >> >> >> >> -Edward >> >> >> >> On Thu, Aug 20, 2015 at 9:01 PM, Manuel M T Chakravarty < >> [email protected]> wrote: >> >> That’s an interesting idea. >> >> Manuel >> >> > Edward Kmett <[email protected]>: >> >> > >> > Would it be possible to add unsafe primops to add Array# and >> SmallArray# entries to an ArrayArray#? The fact that the ArrayArray# >> entries are all directly unlifted avoiding a level of indirection for the >> containing structure is amazing, but I can only currently use it if my leaf >> level data can be 100% unboxed and distributed among ByteArray#s. It'd be >> nice to be able to have the ability to put SmallArray# a stuff down at the >> leaves to hold lifted contents. >> > >> > I accept fully that if I name the wrong type when I go to access one of >> the fields it'll lie to me, but I suppose it'd do that if i tried to use >> one of the members that held a nested ArrayArray# as a ByteArray# anyways, >> so it isn't like there is a safety story preventing this. >> > >> > I've been hunting for ways to try to kill the indirection problems I >> get with Haskell and mutable structures, and I could shoehorn a number of >> them into ArrayArrays if this worked. >> > >> > Right now I'm stuck paying for 2 or 3 levels of unnecessary indirection >> compared to c/java and this could reduce that pain to just 1 level of >> unnecessary indirection. >> > >> > -Edward >> >> > _______________________________________________ >> > ghc-devs mailing list >> > [email protected] >> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >> >> >> >> >> _______________________________________________ >> ghc-devs mailing list >> [email protected] >> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >> >
_______________________________________________ ghc-devs mailing list [email protected] http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
