Re: [rust-dev] Reminder: ~[T] is not going away

2014-04-04 Thread Daniel Micay
On 03/04/14 11:48 PM, Nathan Myers wrote:
 
 Perhaps the best thing is to wait a month (or two or three) until DST
 is more of a reality and then see how we feel.

 Are you thinking we should also wait before converting the current uses
 of ~[T] to VecT? Doing the migration gives us the performance[1] and
 zero-length-zero-alloc benefits, but there were some concerns about
 additional library churn if we end up converting back to DST's ~[T].
 
 I can't speak about how a usage choice affects the standard library,
 but it seems worth mentioning that vector capacity doesn't have to be
 in the base object; it can live in the secondary storage, prepended
 before the elements.

Needing to use a header seriously hurts the performance. The new vector
is 7x faster at pushing elements when space isn't reserved compared to
the old one, all due to leaving off the length/capacity header.

The overhead would be less if it stored the capacity inside *and*
outside the vector, but it's still overhead. It's an extra overflow
check branch along with needing to calculate padding for alignment in
the future, extra space in the memory allocation and more pointer
aliasing issues.

 A zero-length VecT might be null for the case of zero capacity,
 or non-null when it has room to grow.

It's going to be forbidden from actually being null in the future when
the Option-like enum optimization is applied to it via an attribute.
This work has already landed - calling exchange_free on a zero-size
allocation is *forbidden*.



signature.asc
Description: OpenPGP digital signature
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Reminder: ~[T] is not going away

2014-04-04 Thread Manu Thambi


Needing to use a header seriously hurts the performance. The new 
vector is 7x faster at pushing elements when space isn't reserved 
compared to the old one, all due to leaving off the length/capacity 
header. The overhead would be less if it stored the capacity inside 
*and* outside the vector, but it's still overhead. It's an extra 
overflow check branch along with needing to calculate padding for 
alignment in the future, extra space in the memory allocation and more 
pointer aliasing issues. 
Perhaps I am not understanding you correctly. Assuming that the capacity 
is stored inside and outside Vec, the only overhead
I see is during allocation/deallocation. Otherwise the code will be 
identical. If you are worried about space, there is a cost of
passing around Vecs ( vs ~[T]), which consumes and extra register for 
the capacity.



It's going to be forbidden from actually being null in the future when
the Option-like enum optimization is applied to it via an attribute.
This work has already landed - calling exchange_free on a zero-size
allocation is *forbidden*.
As mentioned elsewhere on this thread, we can use another invalid 
pointer value to represent

either Option-None or 0 capacity depending on which is more efficient.

Manu

___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Reminder: ~[T] is not going away

2014-04-04 Thread Daniel Micay
On 04/04/14 10:51 AM, Manu Thambi wrote:
 
 Needing to use a header seriously hurts the performance. The new
 vector is 7x faster at pushing elements when space isn't reserved
 compared to the old one, all due to leaving off the length/capacity
 header. The overhead would be less if it stored the capacity inside
 *and* outside the vector, but it's still overhead. It's an extra
 overflow check branch along with needing to calculate padding for
 alignment in the future, extra space in the memory allocation and more
 pointer aliasing issues. 
 Perhaps I am not understanding you correctly. Assuming that the capacity
 is stored inside and outside Vec, the only overhead
 I see is during allocation/deallocation. Otherwise the code will be
 identical.

It bloats the code size by requiring extra overflow checks in functions
like `push`, which impacts performance. Unwinding prevents many LLVM
passes from doing their job, since it adds significant complexity to the
control flow.

In addition to this, there is even an impact on the performance of
immutable operations like indexing. There's a need to calculate the
offset to the first element in the vector, which includes compensating
for alignment because there can be padding in between the capacity and
the first element in the vector.

You can deny that this has performance implications, but the fact is
that I have looked at the performance and code size impact in depth and
and have hard numbers from benchmarks proving that there is a enormous
performance overhead for this choice.

 If you are worried about space, there is a cost of
 passing around Vecs ( vs ~[T]), which consumes and extra register for
 the capacity.

Passing vectors around by-value isn't a common operation. In the common
case, functions operate on mutable or immutable borrowed slices. In
uncommon cases, they operator on `mut VecT` in order to change the
length in place. There are rare cases when ownership needs to be moved,
but it's rare for it not to correspond by a constant factor to the
number of allocations.

 It's going to be forbidden from actually being null in the future when
 the Option-like enum optimization is applied to it via an attribute.
 This work has already landed - calling exchange_free on a zero-size
 allocation is *forbidden*.
 As mentioned elsewhere on this thread, we can use another invalid
 pointer value to represent
 either Option-None or 0 capacity depending on which is more efficient.

I've already implemented support for this in the compiler some time ago
and the library portion is now in master. This means it's invalid to
call exchange_free on an allocation with a zero size capacity, so slices
need to track whether the allocation is zero size. A zero size length
does not imply a zero size capacity unless `VecT` - `~[T]` is not a
no-op, which is what I am saying. Commits:

1778b6361627c5894bf75ffecf427573af02d390
898669c4e203ae91e2048fb6c0f8591c867bccc6



signature.asc
Description: OpenPGP digital signature
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Reminder: ~[T] is not going away

2014-04-04 Thread Daniel Micay
On 04/04/14 01:50 PM, Manu Thambi wrote:

 As Nathan mentioned, the capacity is stored at a negative offset to the
 pointer to the heap.

Storing at a negative index removes the cost at indexing, but not
elsewhere. It still consumes more memory and makes `push` slower,
especially since it has to do more than more offset based on alignment
with at least one overflow check.

 So the Vec code should be identical, except that during
 allocation/re-allocation, we need
 to compute the heap pointer by adding sizeof(uint) to the value returned
 by malloc().
 (and the opposite computation on free())
 
 indexing, etc, will not change, from how it is done now.

It has to check for overflow on any addition like this. The inability to
pass a size to `dealloc` is not going to be free either. Teaching LLVM
to understand the pointer gymnastics here means trying to make it
simpler rather than allowing it to become more complicated.

 Passing vectors around by-value isn't a common operation. In the
 common case, functions operate on mutable or immutable borrowed
 slices. In uncommon cases, they operator on `mut VecT` in order to
 change the length in place. There are rare cases when ownership needs
 to be moved, but it's rare for it not to correspond by a constant
 factor to the number of allocations. 
 
 I agree that passing around Vec by value is uncommon. But you seem to be
 concerned about
 VecT - ~[T] performance, which should also be a rare transfer of
 ownership.

I'm not at all concerned about it. I think it would be a huge mistake to
use `~[T]` frequently at all, and I'm simply pointing out that this is
not going to be a no-op because that claim was made several times.

 I've already implemented support for this in the compiler some time ago
 and the library portion is now in master. This means it's invalid to
 call exchange_free on an allocation with a zero size capacity, so slices
 need to track whether the allocation is zero size. A zero size length
 does not imply a zero size capacity unless `VecT` - `~[T]` is not a
 no-op, which is what I am saying. Commits:

 1778b6361627c5894bf75ffecf427573af02d390
 898669c4e203ae91e2048fb6c0f8591c867bccc6
 
 I understand that we cannot call free with a zero size/capacity.
 
 There are three possibilities:
 
 a) Use the special pointer value to represent Option::None. The VecT
 - ~[T] would be a no-op.

An empty vector is not the same as `None`. Reserving an address is also
not possible in all environments Rust is going to be used in as a
language, and I think it should be up to the allocator implementation
rather than hard-coded knowledge in the compiler. At the moment, the
`Some(~())` problem is fixed with no overhead anywhere, and allocators
have the choice between a sentinel and clamping zero-size allocations to 1.

 b) If that makes implementation of Option complicated, then use the
 special pointer value to represent
 a zero capacity. We can use that special value in VecT as well, even
 though it is not needed. This
 will keep VecT - ~[T] a no-op.

This will add a branch to every deallocation call.

 c) Conversion between VecT - ~[T] is not likely to be common. So,
 doing an additional check is okay?

It's not about there being an additional check. It's about it having to
drop excess capacity, which will make conversions to and from `~[T]`
hurt. This can easily result in higher time complexity rather than just
a constant factor slowdown.

I don't think conversion from `VecT` - `~[T]` is important, and I
just want to make it clear that there's no way it is going to be free.

The cost can not simply be hand-waved away by moving it elsewhere, such
as requiring new branches and losing the ability to pass a size to
`dealloc`.



signature.asc
Description: OpenPGP digital signature
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Reminder: ~[T] is not going away

2014-04-04 Thread Manu Thambi


On 04/04/2014 02:51 PM, Daniel Micay wrote:


Storing at a negative index removes the cost at indexing, but not
elsewhere. It still consumes more memory and makes `push` slower,
especially since it has to do more than more offset based on alignment
with at least one overflow check.


In the negative index scheme, the length and capacity in the Vec would 
be identical
to what is it in the current implementation. Hence the code will be 
identical, except
for while allocating/deallocatiing. (ie, push() would have the same 
performance)



It has to check for overflow on any addition like this. The inability to
pass a size to `dealloc` is not going to be free either. Teaching LLVM
to understand the pointer gymnastics here means trying to make it
simpler rather than allowing it to become more complicated.
I don't understand what addition you mean? The only time you need the 
size stored in the negative index

is to call dealloc.

You absolutely can pass size into dealloc while destructing ~[T]. Just 
use the size, stored in the negative

index.



I'm not at all concerned about it. I think it would be a huge mistake to
use `~[T]` frequently at all, and I'm simply pointing out that this is
not going to be a no-op because that claim was made several times.
I will be a no-op, if you use null (0) to indicate 0-capacity, and 
special value(1?) to indicate

Option::None.



An empty vector is not the same as `None`. Reserving an address is also
not possible in all environments Rust is going to be used in as a
language, and I think it should be up to the allocator implementation
rather than hard-coded knowledge in the compiler. At the moment, the
`Some(~())` problem is fixed with no overhead anywhere, and allocators
have the choice between a sentinel and clamping zero-size allocations to 1.
Can you name one architecture, where we are not able to find a single 
extra invalid virtual address

other than 0?

Just to clear, the negative index scheme, will allow free() to take 
the size argument.



b) If that makes implementation of Option complicated, then use the
special pointer value to represent
a zero capacity. We can use that special value in VecT as well, even
though it is not needed. This
will keep VecT - ~[T] a no-op.

This will add a branch to every deallocation call.


No it wouldn't. Vec, doesn't have to check the pointer. Just check the 
capacity.



c) Conversion between VecT - ~[T] is not likely to be common. So,
doing an additional check is okay?

It's not about there being an additional check. It's about it having to
drop excess capacity, which will make conversions to and from `~[T]`
hurt. This can easily result in higher time complexity rather than just
a constant factor slowdown.

I don't think conversion from `VecT` - `~[T]` is important, and I
just want to make it clear that there's no way it is going to be free.

The cost can not simply be hand-waved away by moving it elsewhere, such
as requiring new branches and losing the ability to pass a size to
`dealloc`.

We negative index scheme does not require you to drop excess capacity. 
With this scheme,
~[T] and VecT would contain the same amount info. The only difference 
is that in ~[T], the
capacity is stored at a negative index. In VecT, capacity is stored, 
both inline and at the negative

index.

The only overhead would be a couple of checks/additions during 
allocation/deallocation. Everything

else would perform exactly as it does now.

Manu

___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Reminder: ~[T] is not going away

2014-04-04 Thread Daniel Micay
On 04/04/14 04:12 PM, Manu Thambi wrote:
 
 On 04/04/2014 02:51 PM, Daniel Micay wrote:

 Storing at a negative index removes the cost at indexing, but not
 elsewhere. It still consumes more memory and makes `push` slower,
 especially since it has to do more than more offset based on alignment
 with at least one overflow check.
 
 In the negative index scheme, the length and capacity in the Vec would
 be identical
 to what is it in the current implementation. Hence the code will be
 identical, except
 for while allocating/deallocatiing. (ie, push() would have the same
 performance)

It won't have the same performance, because the performance hit comes
from the code size increase needed to handle offsetting and overflow
checking along with aliasing issues.

It was slow because a header involves offsets and overflow checks. It
also screws up the alias analysis. The negative index solution suffers
from this almost as much as the old vector representation.

I feel I've made the reasons why it's slower clear and you simply don't
believe what I said. The performance gains from removing the header from
vectors weren't imaginary. Even a better implementation than the one in
`std::slice` is still slower.

 It has to check for overflow on any addition like this. The inability to
 pass a size to `dealloc` is not going to be free either. Teaching LLVM
 to understand the pointer gymnastics here means trying to make it
 simpler rather than allowing it to become more complicated.
 I don't understand what addition you mean? The only time you need the
 size stored in the negative index
 is to call dealloc.
 
 You absolutely can pass size into dealloc while destructing ~[T]. Just
 use the size, stored in the negative index.

You can pass it for the negative index proposal, but not the other
proposals. The negative index proposal involves bloating the `VecT`
type to micro-optimize what is going to be an incredibly rare
conversion, while the other proposals lose the ability to pass the
length. I don't see a valid reason to change the status quo.

 I'm not at all concerned about it. I think it would be a huge mistake to
 use `~[T]` frequently at all, and I'm simply pointing out that this is
 not going to be a no-op because that claim was made several times.
 I will be a no-op, if you use null (0) to indicate 0-capacity, and
 special value(1?) to indicate
 Option::None.

You can't use a special value to indicate None without adding a lang
item, no other pointer values are specified by Rust or LLVM as being
invalid.

 An empty vector is not the same as `None`. Reserving an address is also
 not possible in all environments Rust is going to be used in as a
 language, and I think it should be up to the allocator implementation
 rather than hard-coded knowledge in the compiler. At the moment, the
 `Some(~())` problem is fixed with no overhead anywhere, and allocators
 have the choice between a sentinel and clamping zero-size allocations
 to 1.
 Can you name one architecture, where we are not able to find a single
 extra invalid virtual address
 other than 0?

Whether or not *I* can name such an architecture doesn't matter.

Rust is meant to be a portable language, even to platforms this specific
contributor is not familiar with.

This would add a dependency on global variables for unique pointers,
even though you could implement them on in an environment with only a
stack using a fixed-size pool.

 Just to clear, the negative index scheme, will allow free() to take
 the size argument.

I'm talking about all of the proposed solutions such as the ones at the
end of your message in isolation from the proposal to require `VecT`
to have a header (not going to happen).

 b) If that makes implementation of Option complicated, then use the
 special pointer value to represent
 a zero capacity. We can use that special value in VecT as well, even
 though it is not needed. This
 will keep VecT - ~[T] a no-op.
 This will add a branch to every deallocation call.
 
 No it wouldn't. Vec, doesn't have to check the pointer. Just check the
 capacity.

Checking the capacity is a branch.

 The only overhead would be a couple of checks/additions during
 allocation/deallocation. Everything
 else would perform exactly as it does now.

It will cause `push` to perform worse than it does now and it will cause
`VecT` to allocate more memory. All to micro-optimize a conversion to
a nearly useless type. I've made it clear why adding headers to vectors
decreases the performance.

You clearly don't believe me and I won't be wasting my time on this
thread anymore.



signature.asc
Description: OpenPGP digital signature
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Reminder: ~[T] is not going away

2014-04-04 Thread Manu Thambi
Most of your comments below do not apply to a properly implemented 
negative index scheme.

So, it seems clear to me, that I haven't been able to get it across to you.

I guess we can both agree that spending more time on this thread is 
unproductive, especially

since the real question is whether we would *want* to have ~[T] used.

Thank you for your time.

Manu

On 04/04/2014 05:09 PM, Daniel Micay wrote:

On 04/04/14 04:12 PM, Manu Thambi wrote:

On 04/04/2014 02:51 PM, Daniel Micay wrote:

Storing at a negative index removes the cost at indexing, but not
elsewhere. It still consumes more memory and makes `push` slower,
especially since it has to do more than more offset based on alignment
with at least one overflow check.

In the negative index scheme, the length and capacity in the Vec would
be identical
to what is it in the current implementation. Hence the code will be
identical, except
for while allocating/deallocatiing. (ie, push() would have the same
performance)

It won't have the same performance, because the performance hit comes
from the code size increase needed to handle offsetting and overflow
checking along with aliasing issues.

It was slow because a header involves offsets and overflow checks. It
also screws up the alias analysis. The negative index solution suffers
from this almost as much as the old vector representation.

I feel I've made the reasons why it's slower clear and you simply don't
believe what I said. The performance gains from removing the header from
vectors weren't imaginary. Even a better implementation than the one in
`std::slice` is still slower.


It has to check for overflow on any addition like this. The inability to
pass a size to `dealloc` is not going to be free either. Teaching LLVM
to understand the pointer gymnastics here means trying to make it
simpler rather than allowing it to become more complicated.

I don't understand what addition you mean? The only time you need the
size stored in the negative index
is to call dealloc.

You absolutely can pass size into dealloc while destructing ~[T]. Just
use the size, stored in the negative index.

You can pass it for the negative index proposal, but not the other
proposals. The negative index proposal involves bloating the `VecT`
type to micro-optimize what is going to be an incredibly rare
conversion, while the other proposals lose the ability to pass the
length. I don't see a valid reason to change the status quo.


I'm not at all concerned about it. I think it would be a huge mistake to
use `~[T]` frequently at all, and I'm simply pointing out that this is
not going to be a no-op because that claim was made several times.

I will be a no-op, if you use null (0) to indicate 0-capacity, and
special value(1?) to indicate
Option::None.

You can't use a special value to indicate None without adding a lang
item, no other pointer values are specified by Rust or LLVM as being
invalid.


An empty vector is not the same as `None`. Reserving an address is also
not possible in all environments Rust is going to be used in as a
language, and I think it should be up to the allocator implementation
rather than hard-coded knowledge in the compiler. At the moment, the
`Some(~())` problem is fixed with no overhead anywhere, and allocators
have the choice between a sentinel and clamping zero-size allocations
to 1.

Can you name one architecture, where we are not able to find a single
extra invalid virtual address
other than 0?

Whether or not *I* can name such an architecture doesn't matter.

Rust is meant to be a portable language, even to platforms this specific
contributor is not familiar with.

This would add a dependency on global variables for unique pointers,
even though you could implement them on in an environment with only a
stack using a fixed-size pool.


Just to clear, the negative index scheme, will allow free() to take
the size argument.

I'm talking about all of the proposed solutions such as the ones at the
end of your message in isolation from the proposal to require `VecT`
to have a header (not going to happen).


b) If that makes implementation of Option complicated, then use the
special pointer value to represent
a zero capacity. We can use that special value in VecT as well, even
though it is not needed. This
will keep VecT - ~[T] a no-op.

This will add a branch to every deallocation call.

No it wouldn't. Vec, doesn't have to check the pointer. Just check the
capacity.

Checking the capacity is a branch.


The only overhead would be a couple of checks/additions during
allocation/deallocation. Everything
else would perform exactly as it does now.

It will cause `push` to perform worse than it does now and it will cause
`VecT` to allocate more memory. All to micro-optimize a conversion to
a nearly useless type. I've made it clear why adding headers to vectors
decreases the performance.

You clearly don't believe me and I won't be wasting my time on this
thread anymore.



--
Manu Thambi
Mesh Capital, LLC

Re: [rust-dev] Reminder: ~[T] is not going away

2014-04-03 Thread comex
On Wed, Apr 2, 2014 at 9:21 PM, Daniel Micay danielmi...@gmail.com wrote:
 I used a sentinel value in my fix along with providing a guarantee that
 `free` is never called on zero-size allocation. That's the end of any
 no-op `VecT` - `~[T]` conversions since it will need to free a zero
 size allocation. It's not far from just calling `shrink_to_fit`, and
 allowing for passing a size to `free`.

 https://github.com/mozilla/rust/pull/13267

I see the benefit of free knowing the size in this case, although it
seems that it would strongly call for type-level integers to avoid
needing a special case in the compiler.

I don't think this issue necessarily guarantees VecT can't be freely
converted to ~[T].  You could hypothetically special case allocations
for zero-sized types, while keeping all other allocations real
(including zero sized, since the impact would be minimal).

 You're talking about allocators designed around the limitation of an
 API. The design no longer needs to make the same compromises if you're
 going to know the size. The difference between no cache miss and a cache
 miss is not insignificant...

I explained why I think a chunk header is necessary in any case.
Maybe it is still a significant win.  The C++14 proposal claims Google
found one with GCC and tcmalloc, although tcmalloc is rather
inefficient to start with... I would like to see numbers.

Then again, I agree with the other reasons that using ~[T] is a bad
idea, so I have no particular reason to disagree with having the size
parameter either.
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Reminder: ~[T] is not going away

2014-04-03 Thread Huon Wilson

On 03/04/14 17:15, comex wrote:



You're talking about allocators designed around the limitation of an
API. The design no longer needs to make the same compromises if you're
going to know the size. The difference between no cache miss and a cache
miss is not insignificant...

I explained why I think a chunk header is necessary in any case.
Maybe it is still a significant win.  The C++14 proposal claims Google
found one with GCC and tcmalloc, although tcmalloc is rather
inefficient to start with... I would like to see numbers.


Really? I was under the impression that tcmalloc was one of the faster 
allocators in common use. e.g. two posts I found just now via Google:


- https://github.com/blog/1422-tcmalloc-and-mysql
- 
http://www.mysqlperformanceblog.com/2013/03/08/mysql-performance-impact-of-memory-allocators-part-2/



Huon
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Reminder: ~[T] is not going away

2014-04-03 Thread Daniel Micay
On 03/04/14 02:15 AM, comex wrote:
 On Wed, Apr 2, 2014 at 9:21 PM, Daniel Micay danielmi...@gmail.com wrote:
 I used a sentinel value in my fix along with providing a guarantee that
 `free` is never called on zero-size allocation. That's the end of any
 no-op `VecT` - `~[T]` conversions since it will need to free a zero
 size allocation. It's not far from just calling `shrink_to_fit`, and
 allowing for passing a size to `free`.

 https://github.com/mozilla/rust/pull/13267
 
 I see the benefit of free knowing the size in this case, although it
 seems that it would strongly call for type-level integers to avoid
 needing a special case in the compiler.

I'm not sure how type-level integers help. This size is often a dynamic
one and this doesn't involve any special cases in the compiler. The
conversion between VecT and ~[T] is entirely a library feature.

 I don't think this issue necessarily guarantees VecT can't be freely
 converted to ~[T].  You could hypothetically special case allocations
 for zero-sized types, while keeping all other allocations real
 (including zero sized, since the impact would be minimal).

VecT won't be convertible to ~[T] with a no-op after the fix for
`Some(~())` lands:

https://github.com/mozilla/rust/pull/13267

It will need to free the allocation if it is zero-size. Calling
`shrink_to_fit()` isn't far from that and allows passing the length to
the free function.

Extending the Option-like enum optimization to other types like `RcT`
and `VecT` is planned so this issue applies to them too.

 You're talking about allocators designed around the limitation of an
 API. The design no longer needs to make the same compromises if you're
 going to know the size. The difference between no cache miss and a cache
 miss is not insignificant...
 
 I explained why I think a chunk header is necessary in any case.
 Maybe it is still a significant win.  The C++14 proposal claims Google
 found one with GCC and tcmalloc, although tcmalloc is rather
 inefficient to start with... I would like to see numbers.
 
 Then again, I agree with the other reasons that using ~[T] is a bad
 idea, so I have no particular reason to disagree with having the size
 parameter either.




signature.asc
Description: OpenPGP digital signature
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Reminder: ~[T] is not going away

2014-04-03 Thread Niko Matsakis
On Wed, Apr 02, 2014 at 09:21:56PM -0400, Daniel Micay wrote:
 ...A distinct `~[T]` and `VecT` will make the language more
 painful to use...

This is precisely the matter of debate, isn't it? I personally see two
sides to this, which is why I was suggesting that maybe we should wait
until we can gain a bit more experience before making a final decision
here.



Niko
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Reminder: ~[T] is not going away

2014-04-03 Thread Huon Wilson

On 03/04/14 10:22, Niko Matsakis wrote:

On Wed, Apr 02, 2014 at 04:03:37PM -0400, Daniel Micay wrote:

I have no sane proposal to fix this beyond passing a size to free.

I don't believe there is a problem with just not using null to
represent such pointers (for example, 1 would suffice). This does
impose some additional burdens on slice conversion and the like.

This conversation has focused on low-level effects, which is important
to understand, but I think the bigger question is: how do we WANT the
language to look? Is it useful to have a distinct `VecT` and `~[T]`
or -- in our ideal world -- would they be the same? I think we can
make the interconversion fast for the default allocator, but we should
design for the language we want to use.

I could go either way on this. In the kind of programs I write, at
least, most vectors get built up to a specific length and then stop
growing (frequently they stop changing as well, but not
always). Sometimes they continue growing. I actually rather like the
idea of using `VecT` as a kind of builder and `~[T]` as the
end-product. In those cases where the vector continues to grow, of
course, I can just keep the `VecT` around. Following this logic, I
would imagine that most APIs want to consume and produce `~[T]`, since
they consume and produce end products.


I don't think the basic routines returning vectors in libstd etc. are 
producing end-products; they are fundamental building blocks, and their 
output will be used in untold ways. (There are not many that consume 
`~[T]`s by-value.)




On the other hand, I could imagine and appreciate an argument that we
should just take and produce `VecT`, which gives somewhat more
flexibility. In general, Rust takes the philosophy that if you own
it, you can mutate it, so why make growing harder than it needs to
be? Preferring VecT also means fewer choices, usually a good thing.

Perhaps the best thing is to wait a month (or two or three) until DST
is more of a reality and then see how we feel.


Are you thinking we should also wait before converting the current uses 
of ~[T] to VecT? Doing the migration gives us the performance[1] and 
zero-length-zero-alloc benefits, but there were some concerns about 
additional library churn if we end up converting back to DST's ~[T].


(I'd also guess doing a complete migration now would make the transition 
slightly easier: no need for staging the libstd changes, and it would 
allow the current ~[] handling to be removed from libsyntax/librustc 
completely, leaving a slightly cleaner slate.)




Huon


[1]: https://github.com/mozilla/rust/issues/8981
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Reminder: ~[T] is not going away

2014-04-03 Thread Daniel Micay
On 03/04/14 01:22 PM, Ziad Hatahet wrote:
 Would it be useful to look at what other languages are doing? For
 instance, slices in Go are appendable, so perhaps it would be worth
 looking at code bases written in Go to see how they deal with slices, or
 how often they append to slices returned from standard library routines.
 
 --
 Ziad

Go doesn't have an equivalent to what `~[T]` will be.

std::unique_ptrT[] is rarely used in C++, and exists solely for
interoperability with legacy code. This is a common use case for
std::unique_ptr in C++, which is why it takes a destructor parameter.
For example, a lone function returning a FILE * pointer can be dealt
with by doing `auto file = make_unique(get_file(), fclose)`, which gives
you a `std::unique_ptrFILE, decltype(fclose)`.



signature.asc
Description: OpenPGP digital signature
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Reminder: ~[T] is not going away

2014-04-03 Thread Ziad Hatahet
On Thu, Apr 3, 2014 at 11:09 AM, Daniel Micay danielmi...@gmail.com wrote:

 Go doesn't have an equivalent to what `~[T]` will be.


Which was my point. From what I understand, Go's slices are analogous to
Rust's VecT in that they are growable. So I was suggesting perusing
existing Go code bases to see how often slices returned from standard
library routines are appended to; which seems to be one motivator for ~[T].


--
Ziad
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Reminder: ~[T] is not going away

2014-04-03 Thread Nathan Myers



Perhaps the best thing is to wait a month (or two or three) until DST
is more of a reality and then see how we feel.


Are you thinking we should also wait before converting the current uses
of ~[T] to VecT? Doing the migration gives us the performance[1] and
zero-length-zero-alloc benefits, but there were some concerns about
additional library churn if we end up converting back to DST's ~[T].


I can't speak about how a usage choice affects the standard library,
but it seems worth mentioning that vector capacity doesn't have to be
in the base object; it can live in the secondary storage, prepended
before the elements.  A zero-length VecT might be null for the
case of zero capacity, or non-null when it has room to grow.  For
maximally trivial conversion to ~T[], the pointer in VecT would
point to the first element, with the capacity at a negative offset.

Nathan Myers


___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Reminder: ~[T] is not going away

2014-04-02 Thread Daniel Micay
On 02/04/14 11:35 AM, Alex Crichton wrote:
 I've noticed recently that there seems to be a bit of confusion about the fate
 of ~[T] with an impending implementation of DST on the horizon. This has been
 accompanied with a number of pull requests to completely remove many uses of
 ~[T] throughout the standard distribution. I'd like to take some time to
 straighten out what's going on with VecT and ~[T].

I think this is a difference of opinion, not confusion. The original
pull requests switching `~[T]` to `VecT` were done by pcwalton, and
this was with full knowledge of the plans for `~[T]`.

 # VecT
 
 In a post-DST world, VecT will be the vector builder type. It will be the
 only type for building up a block of contiguous elements. This type exists
 today, and lives inside of std::vec. Today, you cannot index VecT, but this
 will be enabled in the future once the indexing traits are fleshed out.

It will be Rust's vector (dynamic array) type. I don't think it makes
sense to call it a 'builder' any more than it makes sense to call
`HashMapK, V` a 'hash table builder'. It makes something simple far
more complicated than it needs to be.

 This type will otherwise largely not change from what it is today. It will
 continue to occupy three words in memory, and continue to have the same 
 runtime
 semantics.
 
 # ~[T]
 
 The type ~[T] will still exist in a post-DST, but its representation will
 change. Today, a value of type ~[T] is one word (I'll elide the details of 
 this
 for now). After DST is implemented, ~[T] will be a two-word value of the 
 length
 and a pointer to an array (similarly to what slices are today). The ~[T] type
 will continue to have move semantics, and you can borrow it to [T] as usual.

The `~[T]` type will exist because `[T]` will exist as a type. It won't
be an explicit choice to support having it. Some of us consider it an
unfortunate consequence of DST rather than a useful type.

 The major difference between today's ~[T] type and a post-DST ~[T] is that the
 push() method will be removed. There is no knowledge of a capacity in the
 representation of a ~[T] value, so a push could not be supported at all. In
 theory a pop() can be efficiently supported, but it will likely not be
 implemented at first.

A `pop` or `shift` function is impossible to implement efficiently if
allocators require a size to be passed to `free`.

 # [T]
 
 As part of DST, the type grammar will start accepting [T] as a possible
 substitute for type parameters. This basically means that if your type
 parameters is T, then [U] can satisfy the type parameter.
 
 While possible, I imagine that it will be rare for this to appear in apis. 
 This
 is an unsized type, which means that it's more limited what you can do with it
 than you can with a sized type.
 
 The full details of [T] will become apparent once DST is implemented, but it's
 safe to say that APIs and usage should rarely have to deal with this type, and
 it will likely be mostly transparent.
 
 # Converting between VecT and ~[T]
 
 Conversions between these two types will be provided, and the default
 implementations will be free. Converting from VecT to ~[T] will be simply
 forgetting the capacity, and converting from ~[T] to VecT will set the
 capacity to the length.

Converting from `VecT` to `~[T]` will not be free with an efficient
allocation scheme. I don't think Rust will want to be using a legacy
`malloc`/`free` style API as the underlying default allocator in the
future. I see it only as a temporary measure before a modern allocation
model is implemented.

Without a size parameter to `free`, an allocator needs to track the size
of allocations manually. It increases the memory overhead, along with
adding bookkeeping overhead.

C++ allocators take a `size` parameter to the `deallocate` function for
this reason and I expect Rust will want to do the same. The design of
`malloc` and `free` is far from ideal, because the length is either
known statically or dynamically in nearly every case.

I think leaving out the capacity field of vectors in some cases without
dropping the excess capacity is an an insignificant micro-optimization.
In contract, passing the length to `free` is quite valuable and will
result in a measurable performance win across nearly all Rust code with
an allocator taking advantage of it.

 Helper methods will likely be provided to perform a forceful reallocating
 shrink when going from VecT to ~[T], but it will not be the default.

It has to be the *only* way to do it if Rust is going to be able to
switch to an efficient allocation model in the future. The API of
`malloc`, `realloc` and `free` is purely a legacy wart and shouldn't
drive the design of a new language/library.

 ## The cost of VecT = ~[T]
 
 Some concerns have been brought up that this can in theory be a costly
 transition under the assumption that this does a reallocation of memory to
 shrink to the capacity to exactly the length. This will likely not be the
 

Re: [rust-dev] Reminder: ~[T] is not going away

2014-04-02 Thread Patrick Walton

On 4/2/14 9:25 AM, Daniel Micay wrote:

On 02/04/14 11:35 AM, Alex Crichton wrote:

I've noticed recently that there seems to be a bit of confusion about the fate
of ~[T] with an impending implementation of DST on the horizon. This has been
accompanied with a number of pull requests to completely remove many uses of
~[T] throughout the standard distribution. I'd like to take some time to
straighten out what's going on with VecT and ~[T].


I think this is a difference of opinion, not confusion. The original
pull requests switching `~[T]` to `VecT` were done by pcwalton, and
this was with full knowledge of the plans for `~[T]`.


It was transitionary. I thought that we would have to fully extract 
`~[T]` from the language before DST would work, but it now seems likely 
that that won't need to happen.



The `~[T]` type will exist because `[T]` will exist as a type. It won't
be an explicit choice to support having it. Some of us consider it an
unfortunate consequence of DST rather than a useful type.


Even if you buy that `~[T]` is useless (which I'm not sure I do), it's 
no more unfortunate than the fact that the type system allows useless 
types like `RcRcRcint` is unfortunate.



If `~[T]` remains used throughout the libraries, Rust will become
noisier than languages like C++ with a unified vector type. The need to
convert between `VecT` and `~[T]` would add noise to lots of code,
without any adding measurable optimization win. A micro-optimization
shouldn't drive the design of the libraries, especially when it will
prevent making a significant *macro*-optimization (passing a length to
the deallocation function).


In practice C++ libraries use their own custom vector types all over the 
place, so I wouldn't say that Rust is going to be significantly noisier 
no matter what we do. Interoperability between different libraries is 
not a strong point of C++.


Besides, C++ has this too, with `unique_ptrT[]`. This Stack Overflow 
answer is actually pretty illuminating:


http://stackoverflow.com/questions/16711697/is-there-any-use-for-unique-ptr-with-array

I think that length-frozen owned vectors are likely to be surprisingly 
common. We'll see.


Patrick

___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Reminder: ~[T] is not going away

2014-04-02 Thread Daniel Micay
On 02/04/14 02:28 PM, Patrick Walton wrote:
 On 4/2/14 9:25 AM, Daniel Micay wrote:
 On 02/04/14 11:35 AM, Alex Crichton wrote:
 I've noticed recently that there seems to be a bit of confusion about
 the fate
 of ~[T] with an impending implementation of DST on the horizon. This
 has been
 accompanied with a number of pull requests to completely remove many
 uses of
 ~[T] throughout the standard distribution. I'd like to take some time to
 straighten out what's going on with VecT and ~[T].

 I think this is a difference of opinion, not confusion. The original
 pull requests switching `~[T]` to `VecT` were done by pcwalton, and
 this was with full knowledge of the plans for `~[T]`.
 
 It was transitionary. I thought that we would have to fully extract
 `~[T]` from the language before DST would work, but it now seems likely
 that that won't need to happen.
 
 The `~[T]` type will exist because `[T]` will exist as a type. It won't
 be an explicit choice to support having it. Some of us consider it an
 unfortunate consequence of DST rather than a useful type.
 
 Even if you buy that `~[T]` is useless (which I'm not sure I do), it's
 no more unfortunate than the fact that the type system allows useless
 types like `RcRcRcint` is unfortunate.

No one is proposing that we use `RcRcRcint` in the standard
library. Using `~[T]` instead of migrating to `VecT` means there will
be conversion noise where there was not going to be conversion noise before.

 If `~[T]` remains used throughout the libraries, Rust will become
 noisier than languages like C++ with a unified vector type. The need to
 convert between `VecT` and `~[T]` would add noise to lots of code,
 without any adding measurable optimization win. A micro-optimization
 shouldn't drive the design of the libraries, especially when it will
 prevent making a significant *macro*-optimization (passing a length to
 the deallocation function).
 
 In practice C++ libraries use their own custom vector types all over the
 place, so I wouldn't say that Rust is going to be significantly noisier
 no matter what we do. Interoperability between different libraries is
 not a strong point of C++.
 
 Besides, C++ has this too, with `unique_ptrT[]`. This Stack Overflow
 answer is actually pretty illuminating:

`std::unique_ptr[T]` is useful because lots of legacy code uses the
new[]/delete[] memory allocations. Unique pointers also take a custom
deleter parameter, because they're usable for managing stuff like files,
etc. in C++.

 I think that length-frozen owned vectors are likely to be surprisingly
 common. We'll see.

They'll certainly be common if the standard library forces many
conversions to and from `VecT`... It should not be stated that this
conversion is free though, because it only remains free as long as
you're using a legacy allocation API like `malloc`. It's also not free
in terms of language complexity - people are going to wonder when they
should use each one, and I know I'm certainly going to be telling people
to use `VecT` almost everywhere.



signature.asc
Description: OpenPGP digital signature
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Reminder: ~[T] is not going away

2014-04-02 Thread comex
On Wed, Apr 2, 2014 at 12:25 PM, Daniel Micay danielmi...@gmail.com wrote:
 Without a size parameter to `free`, an allocator needs to track the size
 of allocations manually. It increases the memory overhead, along with
 adding bookkeeping overhead.

Not by very much...  If a chunk's header is stored externally, like
tcmalloc and Linux slub, there is virtually no memory overhead at the
cost of free involving a quick hash table lookup on the address; if
it's stored internally, like jemalloc, the overhead is just possibly
some page-size-remainder wastage, and free just masks the pointer.
Either way, if chunks are ever going to be freed, you need some kind
of header to count free slots.

I guess knowing the size would help the fast path for free be really
simple and even inlined, since it could just swap a fixed thread-local
variable.  But is that really worth hanging language features on, one
way or the other?
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Reminder: ~[T] is not going away

2014-04-02 Thread Clark Gaebel
Passing the size to free is currently in a C++14 proposal [1]. It's pretty
useful (makes free no slower, might make it faster) and in most code, the
size is available on free. I'm not sure it would should be mandatory, but
it's definitely useful.

[1] http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2013/n3536.html


On Wed, Apr 2, 2014 at 3:13 PM, comex com...@gmail.com wrote:

 On Wed, Apr 2, 2014 at 12:25 PM, Daniel Micay danielmi...@gmail.com
 wrote:
  Without a size parameter to `free`, an allocator needs to track the size
  of allocations manually. It increases the memory overhead, along with
  adding bookkeeping overhead.

 Not by very much...  If a chunk's header is stored externally, like
 tcmalloc and Linux slub, there is virtually no memory overhead at the
 cost of free involving a quick hash table lookup on the address; if
 it's stored internally, like jemalloc, the overhead is just possibly
 some page-size-remainder wastage, and free just masks the pointer.
 Either way, if chunks are ever going to be freed, you need some kind
 of header to count free slots.

 I guess knowing the size would help the fast path for free be really
 simple and even inlined, since it could just swap a fixed thread-local
 variable.  But is that really worth hanging language features on, one
 way or the other?
 ___
 Rust-dev mailing list
 Rust-dev@mozilla.org
 https://mail.mozilla.org/listinfo/rust-dev




-- 
Clark.

Key ID : 0x78099922
Fingerprint: B292 493C 51AE F3AB D016  DD04 E5E3 C36F 5534 F907
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Reminder: ~[T] is not going away

2014-04-02 Thread Daniel Micay
On 02/04/14 03:18 PM, Clark Gaebel wrote:
 Passing the size to free is currently in a C++14 proposal [1]. It's
 pretty useful (makes free no slower, might make it faster) and in most
 code, the size is available on free. I'm not sure it would should be
 mandatory, but it's definitely useful.
 
 [1] http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2013/n3536.html

Allocators already do take the size, so it already works for containers,
etc.



signature.asc
Description: OpenPGP digital signature
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Reminder: ~[T] is not going away

2014-04-02 Thread Daniel Micay
On 02/04/14 03:13 PM, comex wrote:
 On Wed, Apr 2, 2014 at 12:25 PM, Daniel Micay danielmi...@gmail.com wrote:
 Without a size parameter to `free`, an allocator needs to track the size
 of allocations manually. It increases the memory overhead, along with
 adding bookkeeping overhead.
 
 Not by very much...  If a chunk's header is stored externally, like
 tcmalloc and Linux slub, there is virtually no memory overhead at the
 cost of free involving a quick hash table lookup on the address; if
 it's stored internally, like jemalloc, the overhead is just possibly
 some page-size-remainder wastage, and free just masks the pointer.
 Either way, if chunks are ever going to be freed, you need some kind
 of header to count free slots.

You're talking about allocators designed around the limitation of an
API. The design no longer needs to make the same compromises if you're
going to know the size. The difference between no cache miss and a cache
miss is not insignificant...

 I guess knowing the size would help the fast path for free be really
 simple and even inlined, since it could just swap a fixed thread-local
 variable.

It's a significant optimization. There's a reason this was included in
the C++ allocator design and is being extended to more of the language
in C++14.

 But is that really worth hanging language features on, one way or the
other?

Is it really worth designing the language around the micro-optimization
of leaving off a capacity field? Rust's syntax is verbose enough without
needing to convert to and from vector/string builders all the time.



signature.asc
Description: OpenPGP digital signature
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Reminder: ~[T] is not going away

2014-04-02 Thread Daniel Micay
On 02/04/14 03:13 PM, comex wrote:
 On Wed, Apr 2, 2014 at 12:25 PM, Daniel Micay danielmi...@gmail.com wrote:

 But is that really worth hanging language features on, one
 way or the other?

This also isn't the only optimization lost here. Zero-size allocations
will need to be clamped to one if passing a size to free isn't required.

Why?

Rust uses a non-nullable pointer optimization, where Option~T and
similar enums can be stored without a tag. This optimization should also
be extended to types like slices in the future. It applies to the
current `~[T]` but would need to be adapted to a new representation.

It's important to avoid allocating for a zero-size allocation, in order
to save memory for ~Trait with zero-size types and to avoid allocating
in zero-size vectors.

However, this means that a zero-size allocation needs to be represented
as non-null. Rust needs a way of knowing that despite being non-null,
there is no allocated capacity. For example, consider a 0-size slice:

(0x22, 0)

When this is passed to `free`, Rust needs to be sure that a 0-size slice
also has a 0-size capacity. In order to do that, shrink_to_fit() needs
to happen during VecT - ~[T] conversions.

At the moment, Rust is completely broken in this regard. The following
expression evaluates to None:

Some(~())

I have no sane proposal to fix this beyond passing a size to free.



signature.asc
Description: OpenPGP digital signature
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Reminder: ~[T] is not going away

2014-04-02 Thread Bill Myers

 
 At the moment, Rust is completely broken in this regard. The following
 expression evaluates to None:
 Some(~())

Ouch, this is a disaster.

Is there a bug filed for this?

Anyway, I don't get your argument about size to free having anything to do with 
fixing it (although I agree that size to free is awesome).

If you don't care about equality (i.e. the fact that *~() != *~(), but a == a 
where a = *~()), just return the address of a single private static 1-byte 
item for any 0-sized allocation.

If you DO care about equality, then you will need at least an integer 
allocation scheme in all cases on 32-bit platforms, and the real costs are the 
data structures to track that (at least a bit in a bitmask, probably at least 2 
bits for an efficient implementation).
If you can't use the 1-2GB of kernel address space, then you'll also need to 
allocate one byte of actual usable address space (but not committed memory).

On 64-bit platforms, you generally have at least around 2^60-2^63 bytes of 
unusable address space, so you can just increment a pointer pointing there for 
each allocation, at zero cost.

Of course the quick and simple fix is to try to call malloc(0) and if it 
returns NULL, remember that and switch to using malloc(1) instead.

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Reminder: ~[T] is not going away

2014-04-02 Thread Daniel Micay
Clamping `malloc(0)` to `malloc(1)` means that allocations of 0-size
types will no longer be free, which is sad. It's very useful to be able
to have meet the requirement of having a trait object and avoid any
memory allocation if there's no state.

The sentinel does work, but adds a branch to *every* free call. It will
not optimize out even for cases where the size is fixed at compile time.
This isn't a significant issue for the default allocator because it will
be complex, but it's a significant issue with a bump/arena allocator, or
a simple free list. It's less overhead than not having a size available
will be, but why not kill two birds with one stone?



signature.asc
Description: OpenPGP digital signature
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Reminder: ~[T] is not going away

2014-04-02 Thread Huon Wilson
Personally, I'm strongly against doing using ~[] as return values from 
library functions.


Imagine we were in world were we only had VecT and were adding a new 
type OwnedSliceT that was (pointer, length) like ~[T]. For how many 
library functions would we say it is sensible to throw away the 
capacity information before returning? I don't think anything in libstd 
etc. would have a strong 'yes' answer to this question.



Specifically, I don't see any concrete positives to doing this for 
library functions other than lets keep using ~[T] and ~[T]  [T] 
having the same in-memory representation (covered below).


Under any scheme I can think of, there are negatives:

1. without calling shrink_to_fit in the conversion, we lose the ability 
to have sized deallocations (covered by others in this thread)


2. if we do call it, then anything returning a ~[T] after building it 
with a VecT is unavoidably slower


3. either way, you're throwing away (the knowledge of) any extra 
capacity that was allocated, so if someone wishes to continue extending 
the slice returned by e.g. `foo`, then `let v = foo().into_vec(); 
v.push(1)` will always require a realloc. (And for library functions, we 
shouldn't be dictating how people use the return values.)


4. it adds two vector-like types that someone needs to think about: in 
the common case the benefits of ~[] (one word smaller) are completely 
useless, it's really only mostly-immutable heavily-nested data types 
with a lot of vectors like Rust's AST where it helps[1]. I.e. almost all

situations are fine (or better) with a Vec.

5. how will the built-in ~[] type use allocators? (well, I guess this is 
really how will the built-in ~ type use allocators?, but that question 
still needs answering[2].)



On the representation of ~[T] and [T] being the same: this means that 
theoretically a ~[T] in covariant(?) position can be coerced to a [T], 
e.g. Vec~[T] - Vec[T]. However, this only really matters for 
functions returning many nested slices/vectors, e.g. the same Vec 
example, because pretty much anything else will be able to write 
`vec.as_slice()` cheaply. (In the code base, the only things mentioning 
/~[~[/ now are a few tests and things handling the raw argc/argv, i.e. 
returning ~[~[u8]].)


I don't think this should be a major concern, because I don't see us 
suddenly growing functions a pile of new functions returning ~[~[T]], 
and if we do, I would think that they would be better suited to being an 
iterator (assuming that's possible) over Vec's, and these internal Vec 
can be then be mapped to ~[T] cheaply before collecting the iterator to 
a whole new VecVec (or Vec~[]) (assuming a [Vec]/[~[]] is wanted).




I'm concerned we are wanting to stick with ~[T] because it's what we 
currently have, and is familiar; as I said above, I don't see many 
positives for doing it for library functions.





Huon


[1]: And even in those cases, it's not a particularly huge gain, e.g. 
taking *two* words off the old OptVec type by replacing it with a 
library equivalent to DST's ~[T] only gained about 40MB: 
http://huonw.github.io/isrustfastyet/mem/#f5357cf,bbf8cdc


[2]: The sanest way to support allocators I can think of would be 
changing `~T` to `UniqT, A=DefaultAlloc`, and then we have `Uniq[T]` 
which certainly feels less attractive than `~[T]`.


On 03/04/14 02:35, Alex Crichton wrote:

I've noticed recently that there seems to be a bit of confusion about the fate
of ~[T] with an impending implementation of DST on the horizon. This has been
accompanied with a number of pull requests to completely remove many uses of
~[T] throughout the standard distribution. I'd like to take some time to
straighten out what's going on with VecT and ~[T].

# VecT

In a post-DST world, VecT will be the vector builder type. It will be the
only type for building up a block of contiguous elements. This type exists
today, and lives inside of std::vec. Today, you cannot index VecT, but this
will be enabled in the future once the indexing traits are fleshed out.

This type will otherwise largely not change from what it is today. It will
continue to occupy three words in memory, and continue to have the same runtime
semantics.

# ~[T]

The type ~[T] will still exist in a post-DST, but its representation will
change. Today, a value of type ~[T] is one word (I'll elide the details of this
for now). After DST is implemented, ~[T] will be a two-word value of the length
and a pointer to an array (similarly to what slices are today). The ~[T] type
will continue to have move semantics, and you can borrow it to [T] as usual.

The major difference between today's ~[T] type and a post-DST ~[T] is that the
push() method will be removed. There is no knowledge of a capacity in the
representation of a ~[T] value, so a push could not be supported at all. In
theory a pop() can be efficiently supported, but it will likely not be
implemented at first.

# [T]

As part of DST, the type grammar will start 

Re: [rust-dev] Reminder: ~[T] is not going away

2014-04-02 Thread Patrick Walton

On 4/2/14 2:51 PM, Huon Wilson wrote:

Specifically, I don't see any concrete positives to doing this for
library functions other than lets keep using ~[T] and ~[T]  [T]
having the same in-memory representation (covered below).

Under any scheme I can think of, there are negatives:

1. without calling shrink_to_fit in the conversion, we lose the ability
to have sized deallocations (covered by others in this thread)

2. if we do call it, then anything returning a ~[T] after building it
with a VecT is unavoidably slower

3. either way, you're throwing away (the knowledge of) any extra
capacity that was allocated, so if someone wishes to continue extending
the slice returned by e.g. `foo`, then `let v = foo().into_vec();
v.push(1)` will always require a realloc. (And for library functions, we
shouldn't be dictating how people use the return values.)

4. it adds two vector-like types that someone needs to think about: in
the common case the benefits of ~[] (one word smaller) are completely
useless, it's really only mostly-immutable heavily-nested data types
with a lot of vectors like Rust's AST where it helps[1]. I.e. almost all
situations are fine (or better) with a Vec.

5. how will the built-in ~[] type use allocators? (well, I guess this is
really how will the built-in ~ type use allocators?, but that question
still needs answering[2].)


On the representation of ~[T] and [T] being the same: this means that
theoretically a ~[T] in covariant(?) position can be coerced to a [T],
e.g. Vec~[T] - Vec[T]. However, this only really matters for
functions returning many nested slices/vectors, e.g. the same Vec
example, because pretty much anything else will be able to write
`vec.as_slice()` cheaply. (In the code base, the only things mentioning
/~[~[/ now are a few tests and things handling the raw argc/argv, i.e.
returning ~[~[u8]].)

I don't think this should be a major concern, because I don't see us
suddenly growing functions a pile of new functions returning ~[~[T]],
and if we do, I would think that they would be better suited to being an
iterator (assuming that's possible) over Vec's, and these internal Vec
can be then be mapped to ~[T] cheaply before collecting the iterator to
a whole new VecVec (or Vec~[]) (assuming a [Vec]/[~[]] is wanted).



I'm concerned we are wanting to stick with ~[T] because it's what we
currently have, and is familiar; as I said above, I don't see many
positives for doing it for library functions.


What about strings? Should we be using `StrBuf` as well?

Patrick

___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Reminder: ~[T] is not going away

2014-04-02 Thread Huon Wilson

On 03/04/14 08:54, Patrick Walton wrote:

On 4/2/14 2:51 PM, Huon Wilson wrote:

Specifically, I don't see any concrete positives to doing this for
library functions other than lets keep using ~[T] and ~[T]  [T]
having the same in-memory representation (covered below).

Under any scheme I can think of, there are negatives:

1. without calling shrink_to_fit in the conversion, we lose the ability
to have sized deallocations (covered by others in this thread)

2. if we do call it, then anything returning a ~[T] after building it
with a VecT is unavoidably slower

3. either way, you're throwing away (the knowledge of) any extra
capacity that was allocated, so if someone wishes to continue extending
the slice returned by e.g. `foo`, then `let v = foo().into_vec();
v.push(1)` will always require a realloc. (And for library functions, we
shouldn't be dictating how people use the return values.)

4. it adds two vector-like types that someone needs to think about: in
the common case the benefits of ~[] (one word smaller) are completely
useless, it's really only mostly-immutable heavily-nested data types
with a lot of vectors like Rust's AST where it helps[1]. I.e. almost all
situations are fine (or better) with a Vec.

5. how will the built-in ~[] type use allocators? (well, I guess this is
really how will the built-in ~ type use allocators?, but that question
still needs answering[2].)


On the representation of ~[T] and [T] being the same: this means that
theoretically a ~[T] in covariant(?) position can be coerced to a [T],
e.g. Vec~[T] - Vec[T]. However, this only really matters for
functions returning many nested slices/vectors, e.g. the same Vec
example, because pretty much anything else will be able to write
`vec.as_slice()` cheaply. (In the code base, the only things mentioning
/~[~[/ now are a few tests and things handling the raw argc/argv, i.e.
returning ~[~[u8]].)

I don't think this should be a major concern, because I don't see us
suddenly growing functions a pile of new functions returning ~[~[T]],
and if we do, I would think that they would be better suited to being an
iterator (assuming that's possible) over Vec's, and these internal Vec
can be then be mapped to ~[T] cheaply before collecting the iterator to
a whole new VecVec (or Vec~[]) (assuming a [Vec]/[~[]] is wanted).



I'm concerned we are wanting to stick with ~[T] because it's what we
currently have, and is familiar; as I said above, I don't see many
positives for doing it for library functions.


What about strings? Should we be using `StrBuf` as well?

Patrick

___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev



I don't see why not. The same arguments apply.


Huon
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Reminder: ~[T] is not going away

2014-04-02 Thread Niko Matsakis
On Wed, Apr 02, 2014 at 04:03:37PM -0400, Daniel Micay wrote:
 I have no sane proposal to fix this beyond passing a size to free.

I don't believe there is a problem with just not using null to
represent such pointers (for example, 1 would suffice). This does
impose some additional burdens on slice conversion and the like.

This conversation has focused on low-level effects, which is important
to understand, but I think the bigger question is: how do we WANT the
language to look? Is it useful to have a distinct `VecT` and `~[T]`
or -- in our ideal world -- would they be the same? I think we can
make the interconversion fast for the default allocator, but we should
design for the language we want to use.

I could go either way on this. In the kind of programs I write, at
least, most vectors get built up to a specific length and then stop
growing (frequently they stop changing as well, but not
always). Sometimes they continue growing. I actually rather like the
idea of using `VecT` as a kind of builder and `~[T]` as the
end-product. In those cases where the vector continues to grow, of
course, I can just keep the `VecT` around. Following this logic, I
would imagine that most APIs want to consume and produce `~[T]`, since
they consume and produce end products.

On the other hand, I could imagine and appreciate an argument that we
should just take and produce `VecT`, which gives somewhat more
flexibility. In general, Rust takes the philosophy that if you own
it, you can mutate it, so why make growing harder than it needs to
be? Preferring VecT also means fewer choices, usually a good thing.

Perhaps the best thing is to wait a month (or two or three) until DST
is more of a reality and then see how we feel.



Niko
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Reminder: ~[T] is not going away

2014-04-02 Thread Daniel Micay
On 02/04/14 07:22 PM, Niko Matsakis wrote:
 On Wed, Apr 02, 2014 at 04:03:37PM -0400, Daniel Micay wrote:
 I have no sane proposal to fix this beyond passing a size to free.
 
 I don't believe there is a problem with just not using null to
 represent such pointers (for example, 1 would suffice). This does
 impose some additional burdens on slice conversion and the like.

I used a sentinel value in my fix along with providing a guarantee that
`free` is never called on zero-size allocation. That's the end of any
no-op `VecT` - `~[T]` conversions since it will need to free a zero
size allocation. It's not far from just calling `shrink_to_fit`, and
allowing for passing a size to `free`.

https://github.com/mozilla/rust/pull/13267

I don't think there's any way around without making `~ZeroSizeType`
start allocating memory or losing the `OptionNonNullablePointer`
optimization otherwise.

 This conversation has focused on low-level effects, which is important
 to understand, but I think the bigger question is: how do we WANT the
 language to look? Is it useful to have a distinct `VecT` and `~[T]`
 or -- in our ideal world -- would they be the same? I think we can
 make the interconversion fast for the default allocator, but we should
 design for the language we want to use.

A distinct `~[T]` and `VecT` will make the language more painful to
use, so the only point I'm trying to counter is the performance one
because it is *is* a valid micro-optimization in some cases.

If our default allocation scheme takes advantage of a known size, then
it will be faster. I don't think we should keep using a
malloc/realloc/free-style API under the hood in the future.

 I could go either way on this. In the kind of programs I write, at
 least, most vectors get built up to a specific length and then stop
 growing (frequently they stop changing as well, but not
 always). Sometimes they continue growing. I actually rather like the
 idea of using `VecT` as a kind of builder and `~[T]` as the
 end-product. In those cases where the vector continues to grow, of
 course, I can just keep the `VecT` around. Following this logic, I
 would imagine that most APIs want to consume and produce `~[T]`, since
 they consume and produce end products.

The language needs to be providing a significant safety/correctness
guarantee or performance win in exchange for the extra noise and I don't
really think it will be in general. There will be use cases for `~[T]`
but I don't think they will be common.

If an API consumes `~[T]`, it will lose track of capacity the caller may
already be able to provide. If it produces `~[T]`, it will lose track of
capacity the caller may want to use later on.

 On the other hand, I could imagine and appreciate an argument that we
 should just take and produce `VecT`, which gives somewhat more
 flexibility. In general, Rust takes the philosophy that if you own
 it, you can mutate it, so why make growing harder than it needs to
 be? Preferring VecT also means fewer choices, usually a good thing.
 
 Perhaps the best thing is to wait a month (or two or three) until DST
 is more of a reality and then see how we feel.
 
 
 
 Niko




signature.asc
Description: OpenPGP digital signature
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Reminder: ~[T] is not going away

2014-04-02 Thread Kevin Ballard
On Apr 2, 2014, at 8:35 AM, Alex Crichton a...@crichton.co wrote:

 As a concrete example, I'll take the read_to_end() method on io's Reader 
 trait.
 This type must use a VecT internally to read data into the vector, but it 
 will
 return a ~[T] because the contents are conceptually frozen after they have 
 been
 read.

This concrete example is great, because it precisely illustrates a major 
objection I have to returning ~[T].

Reader.read_to_end() internally uses a 64k-byte vector. It reserves 64k bytes, 
then pushes onto this vector until it hits EOF. Every time it fills up the 64k 
capacity it reserves another chunk and keeps reading (this, btw, is I think 
almost certainly unintended behavior and is fixed by #13127, which changes it 
to always keep 64k of space available for each read rather than potentially 
requesting smaller and smaller reads). Note that because it uses 
reserve_at_least() it may actually have more than 64k available. When EOF is 
reached, this vector is returned to the caller.

The problem I have with returning ~[T] here is that both choices for how to 
deal with this wasted space are terrible:

1. Shrink-to-fit before returning. If I'm going to keep the vector around for a 
long time this is a good idea, but if I'm just going to process the vector and 
throw it away, the reallocation was completely unnecessary.
2. Convert to ~[T] without shrinking. The caller has no way to know about the 
potentially massive amount of wasted space. If I'm going to just process the 
vector and throw it away that's fine, but if I'm going to keep it around for a 
while then this is terrible.

The only reasonable solution is to return the VecT and let the caller decide 
if they want to shrink-to-fit or not.

-Kevin Ballard
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Reminder: ~[T] is not going away

2014-04-02 Thread Kevin Ballard
On Apr 2, 2014, at 3:01 PM, Huon Wilson dbau...@gmail.com wrote:

 On 03/04/14 08:54, Patrick Walton wrote:
 
 What about strings? Should we be using `StrBuf` as well?
 
 I don't see why not. The same arguments apply.

I agree. I was actually quite surprised to see that the type was named StrBuf, 
I assumed it was going to be Str just as Vec is not VecBuf.

I'm in full agreement with Huon on this matter. The standard libraries should 
return VecT instead of ~[T] in pretty much every case (the only real 
exception I can think of is Vec~[T] because of the ability to convert to 
Vec[T] or [T]] for free). Similarly I think we should be returning StrBuf 
instead of ~str in all cases. And finally, I think we should just name it Str 
instead of StrBuf.

If developers want to use ~[T] and ~str in their own code, that's fine, but the 
standard libraries should err on the side of preserving information (e.g. 
capacity) and providing a consistent experience. If there's one thing I really 
want to avoid above all else, it's confusing people about whether they should 
be using ~[T] or VecT, because some standard library code uses one and some 
code uses the other.

-Kevin
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev