Re: [go-nuts] Block-based data structures and the Type Parameters - Draft Design

roger peppe Mon, 22 Jun 2020 06:46:25 -0700

On Mon, 22 Jun 2020 at 14:26, Andrew Werner <awerne...@gmail.com> wrote:


> Hey Rog,
>
> I think I sent you a private reply earlier. My bad. I now see what you're
> proposing in the first proposal think it makes a lot of sense.
>
> The thing that I think I had missed was this piece of magic:
> // New returns a new Dequeue instance using B as the block storage backing.
> func New(type T interface{}, *B Block(T))() *Dequeue(T) {
>
> One concern I have here is that the `Slice` method is on the array itself
> so it's possible that invoking it will copy the entire array to the stack.
> Does that concern you?
>

It won't copy the array because it's a pointer method (that's what the "*B"
means).
In fact it *must not* copy the array because then it couldn't return a
reference to it.

The second proposal feels almost identical to what I proposed. Can you help
> me understand how it differs?
>

It is indeed a similar approach. The main difference (beyond the somewhat
simpler dequeue implementation itself) is that my New method returns a
concrete type, not an interface. This means that it's easier to jump to the
definition of its method implementations, potentially more performant
because it can use direct rather than indirect calls (*), and arguably more
idiomatic Go. It also means it's possible to add more methods without
breaking backward compatibility, which is a significant benefit in my view.

  cheers,
    rog.

(*) you'll still get *some* indirect calls, amortised over the buffer size.


>
> On Monday, June 22, 2020 at 7:58:43 AM UTC-4, rog wrote:
>>
>> Thanks for the interesting use case, Andrew!
>>
>> I've experimented with a slightly different approach:
>>
>> https://go2goplay.golang.org/p/AkqzbWmpj6t
>>
>> It expects the caller to implement a method on the array to get a
>> reference to the
>> underlying storage, but that's fairly trivial to implement.
>>
>> I've built it on top of the "list" package from the stdlib (available in
>> the dev.go2go branch under src/cmd/go2go/testdata/go2path).
>>
>> I think it has the potential to be a bit more performant as the
>> type-hiding abstraction (Block in my example) is inside the static type, so
>> most calls can be direct rather than indirect.
>>
>> It's *almost* possible to make this implementation use a slice-based
>> implementation of Block too (thus allowing dynamically specified block
>> sizes at the cost of an extra indirection through the linked list node).
>> That's not possible because the pointer method is invoked on the zero-value
>> of the type, so without using a global variable, there's no way for the
>> method to find out the required block size. A way to work around that is to
>> use another level of indirection instead of using pointer methods directly:
>>
>> https://go2goplay.golang.org/p/nMreEMpqXd1
>>
>> This makes me wonder whether the pointer methods restriction in the
>> generics draft proposal might not be unnecessary and perhaps not
>> necessarily a good idea. ISTM that the only reason for restricting methods
>> to pointer-only is so that they can be called on zero values, and that
>> necessarily restricts the available parameterisation to static-only (or
>> using global variables). I suspect that might turn out to be an
>> anti-pattern. Instead, using function or interface value that operates on
>> the pointer is strictly more general, I think, and doesn't suffer from that
>> problem.
>>
>>   cheers,
>>     rog.
>>
>>
>> On Sun, 21 Jun 2020 at 17:43, Andrew Werner <awer...@gmail.com> wrote:
>>
>>> Thanks for the reply! If I read it correctly, it is already possible to
>>> specify a type constraint as an Array of a given size and so the first part
>>> of the "What I might have expected" actually does work. I noticed that it
>>> works in that it works as a type constraint but I found that types
>>> constrained that way do not support indexing or slicing. Perhaps that's
>>> just a limitation of the prototype and not the proposal.
>>>
>>> The other thing that I think I'm proposing (thanks for the correction)
>>> is a way to represent all arrays of a given type:
>>>
>>> type Array(type T) interface {
>>>     type [...]T
>>> }
>>>
>>> Thanks for dealing with my very long-winded way of saying that.
>>>
>>> On Sunday, June 21, 2020 at 12:35:53 PM UTC-4, David Finkel wrote:
>>>>
>>>>
>>>>
>>>> On Sat, Jun 20, 2020 at 1:22 AM Andrew Werner <awer...@gmail.com>
>>>> wrote:
>>>>
>>>>> [The body of this email is a duplication of the README in 
>>>>> https://github.com/ajwerner/go2dequeue/
>>>>> which also contains the sample implementation]
>>>>>
>>>>> Exercise building a dequeue with the go2 Type Parameter Draft
>>>>>
>>>>> This project is an exploration with the go2go / Type Parameters -
>>>>> Design Draft
>>>>> <https://go.googlesource.com/proposal/+/refs/heads/master/design/go2draft-type-parameters.md>
>>>>> with an eye towards block-based data structures which promote access
>>>>> locality.
>>>>>
>>>>> Such data structures traditionally utilize blocks of objects laid out
>>>>> contiguously in memory. In go, today, one generally specializes such data
>>>>> structures which are performance critical. This is especially important in
>>>>> the case of reducing allocations which occur on critical paths. Using a
>>>>> sync.Pool can help with allocations in some cases but can be clunky to use
>>>>> and, well, do create more pointers for GC to worry about and don't have 
>>>>> the
>>>>> access locality.
>>>>>
>>>>> Whether it's really important to have data structures which include
>>>>> arrays of other data structures laid out contiguously in RAM is really
>>>>> important is sort of orthogonal. Let's just assume that it is for now and
>>>>> look at how we'd do it. We only need to consider rejecting the importance
>>>>> of this case if we can't find a good solution or idiom to support it. The
>>>>> google/btree <https://github.com/google/btree> library README links
>>>>> this Google Open Source Blog Post on block-based container
>>>>> <https://opensource.googleblog.com/2013/01/c-containers-that-save-memory-and-time.html>
>>>>> performance indicating it does matter. The interesting thing about that
>>>>> library is it hardly gets the access locality of the C++ library in the
>>>>> blog post it references.
>>>>>
>>>>> The challenge this library explores is to layout blocks in structures
>>>>> contiguously in RAM while allowing the user to have some control over that
>>>>> block size.
>>>>> This example
>>>>>
>>>>> The approach this experiment takes is to allow the users to specify
>>>>> the block size by providing a function to map an array type to a slice. 
>>>>> The
>>>>> allows the blocks to contain a slice which references the array. The
>>>>> overhead to maintain the slice header is 3 words but the cost is probably
>>>>> less in memory and more in the optimizations the compiler will be less 
>>>>> able
>>>>> to perform. In particular, it probably will need to do bounds checking on
>>>>> that slice and there are probably some opportunities to avoid the pointer
>>>>> interactions. The interface ends up being pretty reasonable though a 
>>>>> little
>>>>> bit ... opaque?
>>>>>
>>>>> What this looked like in this demo is*
>>>>>
>>>>> // Say we want to create a dequeue of things.type Thing struct { Foo 
>>>>> int64, Bar int64, ID [16]byte }// The below incantation will create one 
>>>>> with blocks of 37 elements.d := NewWithArray(func(a *[37]Thing) []Thing { 
>>>>> return (*a)[:] })// The nodes in the linked list of this structure are 
>>>>> 1240 bytes, not that// that's a great number or anything like that, just 
>>>>> saying it's pretty big.// Maybe there'd be something good to get out of 
>>>>> aligning things are cache line// sizes. See TestNodeSize.
>>>>>
>>>>> Other things you could do Use a slice for the buffer
>>>>>
>>>>> A different approach would be to just have the array in the node, take
>>>>> a buffer size and then keep a sync.Pool or freelist of slices. Today's
>>>>> github.com/google/btree does this but way worse in that it just keeps
>>>>> a slice of btree.Item. So fine, it's be two objects, the nodes and their
>>>>> item buffers. Maybe that is the answer but it's pretty lame. My feeling is
>>>>> once somebody is optimizing a data structure for access locality they are
>>>>> trying to optimize as far as they can go.
>>>>> Ignore or augment the language generics with manual or automatic
>>>>> specialization
>>>>>
>>>>> Today, in performance critical cases, people use a variety of
>>>>> automatic or manual specialization. There are tools like
>>>>> https://github.com/cheekybits/genny or
>>>>> https://github.com/mmatczuk/go_generics. Note that the latter has
>>>>> been used to good effect in CockroachDB for building specialized
>>>>> copy-on-write, interval B-trees.
>>>>>
>>>>> One answer to this problem might just be that go's generics solution
>>>>> doesn't solve for the problem of specializing block-based data structures.
>>>>> That's not a particularly satisfying answer.
>>>>> What I might have expected / guess I'm proposing
>>>>>
>>>>> Type lists are a thing for utilizing language features which only
>>>>> apply to certain kinds of types. There should be a way to create a type
>>>>> list for arrays of types.
>>>>>
>>>> As a first pass I would have anticipated that there would be a way to
>>>>> inform this package of a buffer size maybe drawn from a fixed number of
>>>>> sizes. For example:
>>>>>
>>>>> type BlockSize int
>>>>> const (
>>>>>     _ BlockSize = 1 << (4*iota)
>>>>>     Block16
>>>>>     Block256
>>>>> )
>>>>> // Array here allowstype Array(type T) interface {
>>>>>     type [Block16]T, [Block256]T
>>>>> }
>>>>> func New(type T)(blockSize BlockSize) Dequeue(T) {
>>>>>
>>>>> Hmm, this function signature would require being able to return
>>>> different types based on a non-type-parameter, which would require Dequeue
>>>> to be an interface (I don't see a definition, so I'm not sure)
>>>>
>>>> From what I can tell, though, if you make the array-type a
>>>> type-parameter directly, though, this becomes workable in the current
>>>> proposal (although rather awkward for users).
>>>> e.g.
>>>> https://go2goplay.golang.org/p/UTnElflTH27
>>>>
>>>> type ArrayT(type T) interface {
>>>>         type [3]T, [4]T
>>>> }
>>>>
>>>> func NewArray(type T ArrayT(V), V interface{})() T {
>>>>          var v T
>>>>          return v
>>>> }
>>>>
>>>
>>> Hmm, maybe I was just holding it wrong and the proposal in "What I might
>>> have expected" already is the proposal.
>>>
>>>
>>>>
>>>>     switch blockSize {
>>>>>     case Block16:
>>>>>         return &(dequeue(T, [Block16]T){ ... })
>>>>>     case Block256:
>>>>>         return &(dequeue(T, [Block16]T){})
>>>>>
>>>>> nit: I assume the second type-parameter for this case statement should
>>>> be Block256.
>>>>
>>>>>     }
>>>>> }
>>>>> type dequeue(type T interface{}, ArrayT Array(T)) struct {
>>>>>     len
>>>>> }
>>>>> type node(type T interface{}, ArrayT Array(T)) struct {
>>>>>     left, right *node(T, ArrayT)
>>>>>     rb ringBuf(T, ArrayT)
>>>>> }
>>>>> type ringBuf(type T interface{}, ArrayT Array(T)) struct {
>>>>>     head, len int
>>>>>     // buf is known to be an array type with a size of either 16 or 256 so
>>>>>     // I'd expect to be able to interact with this like an array. I don't 
>>>>> think
>>>>>     // that that works today.
>>>>>     buf ArrayT
>>>>> }
>>>>>
>>>>> The above also seems needlessly clunky. I'd prefer if I could just
>>>>> represent all array types like this:
>>>>>
>>>>> type Array(type T) interface {
>>>>>     [...]T
>>>>> }
>>>>>
>>>>> I like that approach to distinguishing between slices and arrays in
>>>> type-lists.
>>>> Nit: I think it would still need the type keyword, so it would look
>>>> like:
>>>>
>>>> type Array(type T) interface {
>>>>         type [...]T
>>>> }
>>>>
>>> Yes, definitely, my bad.
>>>
>>>>
>>>> I feel like there must be an interaction with something else that I'm
>>>> missing, though. Since there is some discussion about limitations with
>>>> respect to the lack of non-type parameters in the omissions section
>>>> <https://go.googlesource.com/proposal/+/refs/heads/master/design/go2draft-type-parameters.md#omissions>.
>>>> (although not about arrays in type-list constraints, specifically)
>>>>
>>>>> Then I could do the following and everything else should just fall out.
>>>>>
>>>>> func New(type T interface{}, ArrayT Array(T))() Dequeue(T) {
>>>>>     return &(dequeue(T, ArrayT){})
>>>>> }
>>>>>
>>>>> The above feels to me to be compatible with the goals and idioms of
>>>>> the proposal but I could be missing out on some major complexity. Hope
>>>>> y'all find this helpful.
>>>>> Aside
>>>>>
>>>>> It would be cool to be able to embed a generic type in another
>>>>> generics type. I can see why it's annoying because the name of the field
>>>>> will get totally mangled in the specialization but maybe that's not so 
>>>>> bad?
>>>>> Maybe you'd actually call the embedded type by its unspecialized type name
>>>>> even in the specialized version.
>>>>>
>>>>> I would have liked to embed the ringBuf in the node. Given it's not
>>>>> embedded it's almost not worth abstracting.
>>>>>
>>>> I think this is already addressed in the current design
>>>> <https://go.googlesource.com/proposal/+/refs/heads/master/design/go2draft-type-parameters.md#embedding-an-instantiated-interface-type>.
>>>> You are allowed to directly embed type-parameters, and they have the name
>>>> of the template argument (rather than its instantiated value).
>>>>
>>> Cool, I'll try it out. I found it not compiling with `go tool go2go` in
>>> my first pass. Maybe I was holding it wrong.
>>>
>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "golang-nuts" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to golan...@googlegroups.com.
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/golang-nuts/CA%2BvRuzOYXz1t6CsWLd6gxS4AErRNCwA_ctdSYiu3Mq-YH_TEDA%40mail.gmail.com
>>>>> <https://groups.google.com/d/msgid/golang-nuts/CA%2BvRuzOYXz1t6CsWLd6gxS4AErRNCwA_ctdSYiu3Mq-YH_TEDA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "golang-nuts" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to golan...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/golang-nuts/2c712c9d-2a5b-44a9-b783-15585a6654b1o%40googlegroups.com
>>> <https://groups.google.com/d/msgid/golang-nuts/2c712c9d-2a5b-44a9-b783-15585a6654b1o%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/golang-nuts/24115910-1de9-4ae4-bfc9-c9c8ecdbf539o%40googlegroups.com
> <https://groups.google.com/d/msgid/golang-nuts/24115910-1de9-4ae4-bfc9-c9c8ecdbf539o%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAJhgachBJKnXe2Cojv-RPR44zRNemTXjxfiaOct3m1Y1MW3d3Q%40mail.gmail.com.

Re: [go-nuts] Block-based data structures and the Type Parameters - Draft Design

Reply via email to