On 9/23/13 8:32 AM, Manu wrote:
On 23 September 2013 23:58, Andrei Alexandrescu
<[email protected] <mailto:[email protected]>>
wrote:
This is a good first step though, I'm happy to discuss this, but I think
discussion about the practical application may also reveal design
details at this level.

Absolutely. This is refreshing since I've gone through the cycle of "types must be integrated into allocators and with the global GC" ... -> ... "untyped allocators can actually be extricated" once. It is now great to revisit the assumptions again.

One matter I'm rather surprised didn't come up (I assumed everyone would be quite curious about it) is that the allocator does not store the allocated size, or at least does not allow it to be discovered easily. This is a definitely appropriate topic for the design at hand.

The malloc-style APIs go like:

void* malloc(size_t size);
void free(void*);

Note that the user doesn't pass the size to free(), which means the allocator is burdened with inferring the size of the block from the pointer efficiently. Given that most allocators make crucial strategic choices depending on the size requested, this API shortcoming is a bane of allocator writers everywhere, and a source of inefficiency in virtually all contemporary malloc-style allocators.

This is most ironic since there is next to nothing that user code can do with a pointer without knowing the extent of memory addressed by it. (A notable exception is zero-terminated strings, another questionable design decision.) It all harkens back to Walter's claim of "C's biggest mistake" (with which I agree) of not defining a type for a bounded memory region.

Upon studying a few extant allocators and D's lore, I decided that in D things have evolved sufficiently to have the user pass the size upon deallocation as well:

void[] allocate(size_t size);
void deallocate(void[] buffer);

This is because the size of D objects is naturally known: classes have it in the classinfo, slices store it, and the few cases of using bald pointers for allocation are irrelevant and unrecommended.

This all makes memory allocation in D a problem fundamentally simpler than in C, which is quite an interesting turn :o).

No, I certainly understand you mean 'them', but you lead to what I'm
asking, how do these things get carried/passed around. Are they
discreet, or will they invade argument lists everywhere?

Since there's a notion of "the default" allocator, there will be ways to push/pop user-defined allocators that temporarily (or permanently) replace the default allocator. This stage of the design is concerned with allowing users to define such user-defined allocators without having to implement them from scratch.

Are they free
to flow in/out of libraries in a natural way?
These patterns are what will define the system as I see it.
Perhaps more importantly, where do these allocators get their memory
themselves (if they're not a bottom-level allocator)? Global override
perhaps, or should a memory source always be explicitly provided to a
non-bottom-level allocator?

There are a few obvious bottom allocators, such as malloc, new, sbrk, or Windows heap. All other allocators will compose by parameterizing on their parent allocator.

Sure, but the common case is that the author will almost certainly use
keyword 'new'. How can I affect that as a 3rd party?
This would require me overriding the global allocator somehow... which
you touched on earlier.

The way I'm thinking of this is to install/uninstall user-defined allocators that will satisfy calls to new. Since not all allocators support tracing, some would require the user to deallocate stuff manually.

    The library wouldn't need to worry as there would be the notion of a
    default allocator (probably backed by the existing GC).

Right. So it's looking like like the ability to override the global
allocator is a critical requirement.

Again, this is difficult to handle in the same conversation as "should we pass size upon deallocation". Yes, we need road laws, but it's hard to talk about those and engine lubrication in the same breath. For me, "critical" right now is to assess whether the untyped API misses an important category of allocators, what safety level it has (that's why "expand" is so different from "realloc"!) etc.

        And does that mean that applications+libraries are required to
        ALWAYS
        allocate through given allocator objects?


    Yes, they should.


Then we make keyword 'new' redundant?

Probably not. Typed allocators will need to integrate with (and occasionally replace) the global shared GC.

    The problem is that installing an allocator does not get to define
    what a pointer is and what a reference is.

Why not?

Because that requires a language change. I'm not sure you realize but you move the goalposts all the time. We were talking within the context of libraries and installing allocators dynamically and all of a sudden you get to change what the compiler thinks a pointer is.

A pointer has a type, like anything else. An ARC pointer can
theoretically have the compiler insert ARC magic.
That does imply though that the allocator affects the type, which I
don't like... I'll think on it.

I talked Walter's ear off a while ago at an ACCU conference about the notion that reference counting could be a switch passed to the compiler. Recently he's authored a DIP about the compiler inserting refcounting primitives in the generated code. Unfortunately I think the DIP is awfully incomplete and superficial (to the extent it would bring more damage to the language if any implementation effort would be invested in it at this point).

Can it be done? Probably. But it would be an absolutely major effort.

    These are notions hardwired into the language, so the notion of
    turning a switch and replacing the global GC with a reference
    counting scheme is impossible at the level of a library API.


Indeed it is. So is this API being built upon an incomplete foundation?

The API is quite good at what it puports to do.

    What I do hope to get to is to have allocators define their own
    pointers and reference types. User code that uses those will be
    guaranteed certain allocation behaviors.


Interesting, will this mangle the pointer type, or the object type being
pointed to? The latter is obviously not desirable. Does the former
actually work in theory?

I don't think I understand what you mean. Honest, it seems to me you're confused about what you want, how it can be done, and what moving pieces are involved.

One example is Rust, which defines several pointer types to implement its scoping and borrowing semantics. I think it's not easy to achieve its purported semantics with fewer types, and time will tell whether programmers will put up with the complexity for the sake of the corresponding benefits.

    At this point collect() is only implemented by the global GC. It is
    possible I'll drop it from the final design. However, it's also
    possible that collect() will be properly defined as "collect all
    objects allocated within this particular allocator that are not
    referred from any objects also allocated within this allocator". I
    think that's a useful definition.

Perhaps. I'm not sure how this situation arises though. Unless you've
managed to implement your own GC inside an allocator.

That should be doable within D today.

I think a critical detail to keep in mind, is that (I suspect) people
simply won't use it if it doesn't interface with keyword 'new'.

I think this is debatable. For one, languages such as Java and C++ still have built-in "new" but quite ubiquitously unrecommend their usage in user code. Far as I can tell that's been a successful campaign.

For two, there are allocations that don't even entail calls to new, such as array concatenation.

It's extremely common that I want to enforce that a library exist
entirely within a designated heap. It can't fall back to the global GC.

I work on platforms where memory is not unified. Different resources
need to go into different heaps.

The proposed API would make that entirely possible. You seem to care only with the appendix "and mind you I only want to use new for all of those!" which is a bit out there. But nevertheless good feedback.


Andrei

Reply via email to