Re: Non-extensibility of Typed Arrays

Filip Pizlo Wed, 04 Sep 2013 12:36:02 -0700

On Sep 4, 2013, at 11:33 AM, Brendan Eich <[email protected]> wrote:

> Filip Pizlo wrote:
>> So, how big are your non-expanddable typed arrays, and what do they look 
>> like?  If they're not smaller than 16 bytes in the common case with 32-bit 
>> pointers, or 32 bytes in the common case with 64-bit pointers, then there is 
>> no performance argument in favor of getting rid of expandable properties.
> 
> I like your analysis, it helps greatly to be quantitative and to talk 
> concretely about implementation trade-offs. However, I don't think it proves 
> as much as you assert.

Depends on your interpretation of what I'm asserting. ;-)  I'm talking about 
typed array objects - ones that can be pointed to by JavaScript values, and 
that have a prototype chain, can be aliased, etc.

> 
> Suppose I want (as IBM did for years, and may still) to implement IEEE 754r 
> decimal in JS, with minimal overhead. I would need 128 bits of flat storage, 
> no variable length, no .buffer or aliasing, and *no expandos*. Can binary 
> data help me do that? If so, how small can the thing be? I'd like a multiple 
> of 16 bytes, but on 64-bit targets that does not leave enough room for TBLR 
> and we don't really need BLR anyway.
> 
> If we can't implement efficient-enough 754r decimal using binary data, that's 
> sad. Not the end of the world, and it doesn't prove a whole lot about 
> anything (perhaps we'll figure out something next year). But the small, 
> fixed-size array case exists (think of Float32Array 4-vectors, homogeneous 
> coordinates). It seems to me you are giving this use-case short shrift.

I'm not.  I care deeply about small arrays.  This analysis wasn't merely a 
thought experiment, it arose from me spending a month trying to figure out how 
to aggressively reduce the overhead of typed arrays.  My original hope was to 
get down to Java-level overheads and my conclusion was that unless I wanted to 
severely punish anyone who said .buffer, I'd have to have one more word of 
overhead than Java (i.e. 16 bytes on 32-bit instead of 12 bytes on 32-bit).

My point is that having custom properties, or not, doesn't change the overhead 
for the existing typed array spec and hence has no effect on small arrays.  The 
reasons for this include:

- Typed arrays already have to be objects, and hence have a well-defined 
behavior on '=='.

- Typed arrays already have to be able to tell you that they are in fact typed 
arrays, since JS doesn't have static typing.

- Typed arrays already have prototypes, and those are observable regardless of 
expandability.  A typed array from one global object will have a different 
prototype than a typed array from a different global object.  Or am I 
misunderstanding the spec?

- Typed arrays already have to know about their buffer.

- Typed arrays already have to know about their offset into the buffer.  Or, 
more likely, they have to have a second pointer that points directly at the 
base from which they are indexed.

- Typed arrays already have to know their length.

You're not proposing changing these aspects of typed arrays, right?

The super short message is this: so long as an object obeys object identity on 
'==' then you can have "free if unused, suboptimal if you use them" custom 
properties by using a weak map on the side.  This is true of typed arrays and 
it would be true of any other object that does object-style ==.  If you 
allocate such an object and never add a custom property then the weak map will 
never have an entry for it; but if you put custom properties in the object then 
the map will have things in it.  But with typed arrays you can do even better 
as my previous message suggests: so long as an object has a seldom-touched 
field and you're willing to eat an extra indirection or an extra branch on that 
field, you can have "free if unused, still pretty good if you use them" custom 
properties by displacing that field.  Typed arrays have both of these 
properties right now and so expandability is a free lunch.

Still find this discussion amusing?  Here's the long story is: It is these 
things that I list above that lead to a 16 byte overhead on 32-bit, and a 
32-byte overhead on 64-bit in the best "sane" case.  Giving typed array objects 
expandability doesn't add to this overhead, because two of the fields necessary 
to implement the above (the type, and the buffer) can be displaced for pointing 
to property storage.  Any imaginable attempt to reduce the overhead incurred by 
the information - using BBOP (big bag of pages) for the type, using an 
out-of-line weak map for the buffer or the type, encoding some of the bits 
inside the pointer to the typed array, etc. - can be also used to eradicate any 
space overhead you'd need for custom properties, so long as you're on board 
with the "free if unused, sub-optimal if you use them" philosophy.

So if we did introduce a new type that has lower overheads, for example a new 
kind of typed arrays - or an entirely new kind of type, say Int64 or 
FloatDecimal or whatever - then for those new types we can revisit some of the 
cruft listed above.  We can say that they don't have an observable buffer, or 
that their prototype isn't observable, or that they have string-like == 
behavior - and then doing those things will indeed reduce space overhead. 
Certainly if you introduced a new kind of typed arrays that didn't have a 
.buffer, then you could definitely shave off one pointer.  Or if you were 
willing to say that using .buffer was super expensive (a direction I rejected 
based on the typed array code I saw) then you would also be able to shave off 
one pointer.  And you'd be able to do this regardless of whether typed arrays 
allowed for custom properties - you can allow for custom properties for free so 
long as typed array objects obey object identity on == since if you really 
wanted them you could just use a side table.

So, to sum up my point: if you already implement typed arrays as they are 
currently specified, then adding custom properties to them will not increase 
size or reduce performance.  This argument isn't true for all types - for 
example adding custom properties to strings wouldn't be as easy - and that 
argument may not be true for any future FloatingDecimal or Int64 type.  But it 
is true for typed arrays right now.  My particular way of implementing "free if 
unused" custom properties is to displace the .buffer pointer but even if you 
got rid of that then I could still support "free if unused" custom properties 
so long as object identity worked.

As a final note, I was doing a thought-experiment of how to optimize Int64 in a 
hypothetical world where the new Int64 type had to behave as if it was 
implemented totally in JS and it was expandable, just for fun.  I came to two 
conclusions:

- If a VM will always heap-allocate exactly 64 bits of memory (not more, not 
less) for each Int64, in a special side-heap without any object header 
overhead, then you can totally support fully expandable Int64's and you can 
make the prototype observable and you can even allow it to be changed.  You do 
it roughly as follows: the type of the object (i.e. the fact that it's an 
Int64) is identified by its place in the heap; the prototype is also identified 
by the place (as in BBOP, see page 3 of 
http://dspace.mit.edu/bitstream/handle/1721.1/6278/AIM-420.pdf?sequence=2); and 
any custom properties are stored in a weak table on the side.  This gives you 
excellent performance so long as you never assign __proto__ and never add 
custom properties, and slightly crappy performance if you do either of those 
things (notably, assigning to __proto__ may require an evacuation or poisoning 
the type table entry for an entire page).  The outcome is that Int64 instances 
behave like full-fledged objects with zero compromises.  If programmers stay 
away from the crazy then they'll have a great time, but if they wan't to go 
bananas then the language still allows it but with some performance cost.

- If the VM wants to go further and create immediate representations of some or 
all Int64's, similarly to what VMs do for JS small integers today, then the 
main problem you run into is object identity: does Int64(1).add(Int64(1)) == 
Int64(1).add(Int64(1))?  A naive JS implementation of an Int64 class would say 
that this is false, since it's likely to allocate a new Int64 each time.  But 
an immediate representation would have no choice but to say true.  You can work 
around this if you say that the VM's implementation of Int64 operations behaves 
as if the add()/sub()/whatever() methods used a singleton cache.  You can still 
then have custom properties; i.e. you could do Int64(2).foo = 42 and then 
Int64(1).add(Int64(1)).foo will return 42, since the VM can keep an 
immediate-int64-to-customproperties map on the side.  That's kind of analogous 
to how you could put a setter on field '2' of Array.prototype and do some 
really hilarious things.

Anyway, the last three paragraphs are somewhat orthogonal to the discussion of 
typed arrays since for those it's just so darn easy to do expandability without 
overhead, but I do wanted to hit on two points:

- Whether or not something is expandable has got absolutely nothing to do with 
how much memory it uses.  Not one bit of memory is wasted from allowing "free 
if unused" expandability, and that is a natural outcome so long as objects obey 
object identity.

- Don't restrict the language only to make VM hackers' lives easier.  VM 
hackers' lives aren't supposed to be easy.  The point of a language is to 
delight users, not VM hackers.

-Filip

_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss

Re: Non-extensibility of Typed Arrays

Reply via email to