On Sep 4, 2013, at 11:33 AM, Brendan Eich <[email protected]> wrote:
> Filip Pizlo wrote:
>> So, how big are your non-expanddable typed arrays, and what do they look
>> like? If they're not smaller than 16 bytes in the common case with 32-bit
>> pointers, or 32 bytes in the common case with 64-bit pointers, then there is
>> no performance argument in favor of getting rid of expandable properties.
>
> I like your analysis, it helps greatly to be quantitative and to talk
> concretely about implementation trade-offs. However, I don't think it proves
> as much as you assert.
Depends on your interpretation of what I'm asserting. ;-) I'm talking about
typed array objects - ones that can be pointed to by JavaScript values, and
that have a prototype chain, can be aliased, etc.
>
> Suppose I want (as IBM did for years, and may still) to implement IEEE 754r
> decimal in JS, with minimal overhead. I would need 128 bits of flat storage,
> no variable length, no .buffer or aliasing, and *no expandos*. Can binary
> data help me do that? If so, how small can the thing be? I'd like a multiple
> of 16 bytes, but on 64-bit targets that does not leave enough room for TBLR
> and we don't really need BLR anyway.
>
> If we can't implement efficient-enough 754r decimal using binary data, that's
> sad. Not the end of the world, and it doesn't prove a whole lot about
> anything (perhaps we'll figure out something next year). But the small,
> fixed-size array case exists (think of Float32Array 4-vectors, homogeneous
> coordinates). It seems to me you are giving this use-case short shrift.
I'm not. I care deeply about small arrays. This analysis wasn't merely a
thought experiment, it arose from me spending a month trying to figure out how
to aggressively reduce the overhead of typed arrays. My original hope was to
get down to Java-level overheads and my conclusion was that unless I wanted to
severely punish anyone who said .buffer, I'd have to have one more word of
overhead than Java (i.e. 16 bytes on 32-bit instead of 12 bytes on 32-bit).
My point is that having custom properties, or not, doesn't change the overhead
for the existing typed array spec and hence has no effect on small arrays. The
reasons for this include:
- Typed arrays already have to be objects, and hence have a well-defined
behavior on '=='.
- Typed arrays already have to be able to tell you that they are in fact typed
arrays, since JS doesn't have static typing.
- Typed arrays already have prototypes, and those are observable regardless of
expandability. A typed array from one global object will have a different
prototype than a typed array from a different global object. Or am I
misunderstanding the spec?
- Typed arrays already have to know about their buffer.
- Typed arrays already have to know about their offset into the buffer. Or,
more likely, they have to have a second pointer that points directly at the
base from which they are indexed.
- Typed arrays already have to know their length.
You're not proposing changing these aspects of typed arrays, right?
The super short message is this: so long as an object obeys object identity on
'==' then you can have "free if unused, suboptimal if you use them" custom
properties by using a weak map on the side. This is true of typed arrays and
it would be true of any other object that does object-style ==. If you
allocate such an object and never add a custom property then the weak map will
never have an entry for it; but if you put custom properties in the object then
the map will have things in it. But with typed arrays you can do even better
as my previous message suggests: so long as an object has a seldom-touched
field and you're willing to eat an extra indirection or an extra branch on that
field, you can have "free if unused, still pretty good if you use them" custom
properties by displacing that field. Typed arrays have both of these
properties right now and so expandability is a free lunch.
Still find this discussion amusing? Here's the long story is: It is these
things that I list above that lead to a 16 byte overhead on 32-bit, and a
32-byte overhead on 64-bit in the best "sane" case. Giving typed array objects
expandability doesn't add to this overhead, because two of the fields necessary
to implement the above (the type, and the buffer) can be displaced for pointing
to property storage. Any imaginable attempt to reduce the overhead incurred by
the information - using BBOP (big bag of pages) for the type, using an
out-of-line weak map for the buffer or the type, encoding some of the bits
inside the pointer to the typed array, etc. - can be also used to eradicate any
space overhead you'd need for custom properties, so long as you're on board
with the "free if unused, sub-optimal if you use them" philosophy.
So if we did introduce a new type that has lower overheads, for example a new
kind of typed arrays - or an entirely new kind of type, say Int64 or
FloatDecimal or whatever - then for those new types we can revisit some of the
cruft listed above. We can say that they don't have an observable buffer, or
that their prototype isn't observable, or that they have string-like ==
behavior - and then doing those things will indeed reduce space overhead.
Certainly if you introduced a new kind of typed arrays that didn't have a
.buffer, then you could definitely shave off one pointer. Or if you were
willing to say that using .buffer was super expensive (a direction I rejected
based on the typed array code I saw) then you would also be able to shave off
one pointer. And you'd be able to do this regardless of whether typed arrays
allowed for custom properties - you can allow for custom properties for free so
long as typed array objects obey object identity on == since if you really
wanted them you could just use a side table.
So, to sum up my point: if you already implement typed arrays as they are
currently specified, then adding custom properties to them will not increase
size or reduce performance. This argument isn't true for all types - for
example adding custom properties to strings wouldn't be as easy - and that
argument may not be true for any future FloatingDecimal or Int64 type. But it
is true for typed arrays right now. My particular way of implementing "free if
unused" custom properties is to displace the .buffer pointer but even if you
got rid of that then I could still support "free if unused" custom properties
so long as object identity worked.
As a final note, I was doing a thought-experiment of how to optimize Int64 in a
hypothetical world where the new Int64 type had to behave as if it was
implemented totally in JS and it was expandable, just for fun. I came to two
conclusions:
- If a VM will always heap-allocate exactly 64 bits of memory (not more, not
less) for each Int64, in a special side-heap without any object header
overhead, then you can totally support fully expandable Int64's and you can
make the prototype observable and you can even allow it to be changed. You do
it roughly as follows: the type of the object (i.e. the fact that it's an
Int64) is identified by its place in the heap; the prototype is also identified
by the place (as in BBOP, see page 3 of
http://dspace.mit.edu/bitstream/handle/1721.1/6278/AIM-420.pdf?sequence=2); and
any custom properties are stored in a weak table on the side. This gives you
excellent performance so long as you never assign __proto__ and never add
custom properties, and slightly crappy performance if you do either of those
things (notably, assigning to __proto__ may require an evacuation or poisoning
the type table entry for an entire page). The outcome is that Int64 instances
behave like full-fledged objects with zero compromises. If programmers stay
away from the crazy then they'll have a great time, but if they wan't to go
bananas then the language still allows it but with some performance cost.
- If the VM wants to go further and create immediate representations of some or
all Int64's, similarly to what VMs do for JS small integers today, then the
main problem you run into is object identity: does Int64(1).add(Int64(1)) ==
Int64(1).add(Int64(1))? A naive JS implementation of an Int64 class would say
that this is false, since it's likely to allocate a new Int64 each time. But
an immediate representation would have no choice but to say true. You can work
around this if you say that the VM's implementation of Int64 operations behaves
as if the add()/sub()/whatever() methods used a singleton cache. You can still
then have custom properties; i.e. you could do Int64(2).foo = 42 and then
Int64(1).add(Int64(1)).foo will return 42, since the VM can keep an
immediate-int64-to-customproperties map on the side. That's kind of analogous
to how you could put a setter on field '2' of Array.prototype and do some
really hilarious things.
Anyway, the last three paragraphs are somewhat orthogonal to the discussion of
typed arrays since for those it's just so darn easy to do expandability without
overhead, but I do wanted to hit on two points:
- Whether or not something is expandable has got absolutely nothing to do with
how much memory it uses. Not one bit of memory is wasted from allowing "free
if unused" expandability, and that is a natural outcome so long as objects obey
object identity.
- Don't restrict the language only to make VM hackers' lives easier. VM
hackers' lives aren't supposed to be easy. The point of a language is to
delight users, not VM hackers.
-Filip
_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss