Re: [rust-dev] Reminder: ~[T] is not going away
On 03/04/14 11:48 PM, Nathan Myers wrote: Perhaps the best thing is to wait a month (or two or three) until DST is more of a reality and then see how we feel. Are you thinking we should also wait before converting the current uses of ~[T] to VecT? Doing the migration gives us the performance[1] and zero-length-zero-alloc benefits, but there were some concerns about additional library churn if we end up converting back to DST's ~[T]. I can't speak about how a usage choice affects the standard library, but it seems worth mentioning that vector capacity doesn't have to be in the base object; it can live in the secondary storage, prepended before the elements. Needing to use a header seriously hurts the performance. The new vector is 7x faster at pushing elements when space isn't reserved compared to the old one, all due to leaving off the length/capacity header. The overhead would be less if it stored the capacity inside *and* outside the vector, but it's still overhead. It's an extra overflow check branch along with needing to calculate padding for alignment in the future, extra space in the memory allocation and more pointer aliasing issues. A zero-length VecT might be null for the case of zero capacity, or non-null when it has room to grow. It's going to be forbidden from actually being null in the future when the Option-like enum optimization is applied to it via an attribute. This work has already landed - calling exchange_free on a zero-size allocation is *forbidden*. signature.asc Description: OpenPGP digital signature ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Reminder: ~[T] is not going away
Needing to use a header seriously hurts the performance. The new vector is 7x faster at pushing elements when space isn't reserved compared to the old one, all due to leaving off the length/capacity header. The overhead would be less if it stored the capacity inside *and* outside the vector, but it's still overhead. It's an extra overflow check branch along with needing to calculate padding for alignment in the future, extra space in the memory allocation and more pointer aliasing issues. Perhaps I am not understanding you correctly. Assuming that the capacity is stored inside and outside Vec, the only overhead I see is during allocation/deallocation. Otherwise the code will be identical. If you are worried about space, there is a cost of passing around Vecs ( vs ~[T]), which consumes and extra register for the capacity. It's going to be forbidden from actually being null in the future when the Option-like enum optimization is applied to it via an attribute. This work has already landed - calling exchange_free on a zero-size allocation is *forbidden*. As mentioned elsewhere on this thread, we can use another invalid pointer value to represent either Option-None or 0 capacity depending on which is more efficient. Manu ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Reminder: ~[T] is not going away
On 04/04/14 10:51 AM, Manu Thambi wrote: Needing to use a header seriously hurts the performance. The new vector is 7x faster at pushing elements when space isn't reserved compared to the old one, all due to leaving off the length/capacity header. The overhead would be less if it stored the capacity inside *and* outside the vector, but it's still overhead. It's an extra overflow check branch along with needing to calculate padding for alignment in the future, extra space in the memory allocation and more pointer aliasing issues. Perhaps I am not understanding you correctly. Assuming that the capacity is stored inside and outside Vec, the only overhead I see is during allocation/deallocation. Otherwise the code will be identical. It bloats the code size by requiring extra overflow checks in functions like `push`, which impacts performance. Unwinding prevents many LLVM passes from doing their job, since it adds significant complexity to the control flow. In addition to this, there is even an impact on the performance of immutable operations like indexing. There's a need to calculate the offset to the first element in the vector, which includes compensating for alignment because there can be padding in between the capacity and the first element in the vector. You can deny that this has performance implications, but the fact is that I have looked at the performance and code size impact in depth and and have hard numbers from benchmarks proving that there is a enormous performance overhead for this choice. If you are worried about space, there is a cost of passing around Vecs ( vs ~[T]), which consumes and extra register for the capacity. Passing vectors around by-value isn't a common operation. In the common case, functions operate on mutable or immutable borrowed slices. In uncommon cases, they operator on `mut VecT` in order to change the length in place. There are rare cases when ownership needs to be moved, but it's rare for it not to correspond by a constant factor to the number of allocations. It's going to be forbidden from actually being null in the future when the Option-like enum optimization is applied to it via an attribute. This work has already landed - calling exchange_free on a zero-size allocation is *forbidden*. As mentioned elsewhere on this thread, we can use another invalid pointer value to represent either Option-None or 0 capacity depending on which is more efficient. I've already implemented support for this in the compiler some time ago and the library portion is now in master. This means it's invalid to call exchange_free on an allocation with a zero size capacity, so slices need to track whether the allocation is zero size. A zero size length does not imply a zero size capacity unless `VecT` - `~[T]` is not a no-op, which is what I am saying. Commits: 1778b6361627c5894bf75ffecf427573af02d390 898669c4e203ae91e2048fb6c0f8591c867bccc6 signature.asc Description: OpenPGP digital signature ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Reminder: ~[T] is not going away
On 04/04/14 01:50 PM, Manu Thambi wrote: As Nathan mentioned, the capacity is stored at a negative offset to the pointer to the heap. Storing at a negative index removes the cost at indexing, but not elsewhere. It still consumes more memory and makes `push` slower, especially since it has to do more than more offset based on alignment with at least one overflow check. So the Vec code should be identical, except that during allocation/re-allocation, we need to compute the heap pointer by adding sizeof(uint) to the value returned by malloc(). (and the opposite computation on free()) indexing, etc, will not change, from how it is done now. It has to check for overflow on any addition like this. The inability to pass a size to `dealloc` is not going to be free either. Teaching LLVM to understand the pointer gymnastics here means trying to make it simpler rather than allowing it to become more complicated. Passing vectors around by-value isn't a common operation. In the common case, functions operate on mutable or immutable borrowed slices. In uncommon cases, they operator on `mut VecT` in order to change the length in place. There are rare cases when ownership needs to be moved, but it's rare for it not to correspond by a constant factor to the number of allocations. I agree that passing around Vec by value is uncommon. But you seem to be concerned about VecT - ~[T] performance, which should also be a rare transfer of ownership. I'm not at all concerned about it. I think it would be a huge mistake to use `~[T]` frequently at all, and I'm simply pointing out that this is not going to be a no-op because that claim was made several times. I've already implemented support for this in the compiler some time ago and the library portion is now in master. This means it's invalid to call exchange_free on an allocation with a zero size capacity, so slices need to track whether the allocation is zero size. A zero size length does not imply a zero size capacity unless `VecT` - `~[T]` is not a no-op, which is what I am saying. Commits: 1778b6361627c5894bf75ffecf427573af02d390 898669c4e203ae91e2048fb6c0f8591c867bccc6 I understand that we cannot call free with a zero size/capacity. There are three possibilities: a) Use the special pointer value to represent Option::None. The VecT - ~[T] would be a no-op. An empty vector is not the same as `None`. Reserving an address is also not possible in all environments Rust is going to be used in as a language, and I think it should be up to the allocator implementation rather than hard-coded knowledge in the compiler. At the moment, the `Some(~())` problem is fixed with no overhead anywhere, and allocators have the choice between a sentinel and clamping zero-size allocations to 1. b) If that makes implementation of Option complicated, then use the special pointer value to represent a zero capacity. We can use that special value in VecT as well, even though it is not needed. This will keep VecT - ~[T] a no-op. This will add a branch to every deallocation call. c) Conversion between VecT - ~[T] is not likely to be common. So, doing an additional check is okay? It's not about there being an additional check. It's about it having to drop excess capacity, which will make conversions to and from `~[T]` hurt. This can easily result in higher time complexity rather than just a constant factor slowdown. I don't think conversion from `VecT` - `~[T]` is important, and I just want to make it clear that there's no way it is going to be free. The cost can not simply be hand-waved away by moving it elsewhere, such as requiring new branches and losing the ability to pass a size to `dealloc`. signature.asc Description: OpenPGP digital signature ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Reminder: ~[T] is not going away
On 04/04/2014 02:51 PM, Daniel Micay wrote: Storing at a negative index removes the cost at indexing, but not elsewhere. It still consumes more memory and makes `push` slower, especially since it has to do more than more offset based on alignment with at least one overflow check. In the negative index scheme, the length and capacity in the Vec would be identical to what is it in the current implementation. Hence the code will be identical, except for while allocating/deallocatiing. (ie, push() would have the same performance) It has to check for overflow on any addition like this. The inability to pass a size to `dealloc` is not going to be free either. Teaching LLVM to understand the pointer gymnastics here means trying to make it simpler rather than allowing it to become more complicated. I don't understand what addition you mean? The only time you need the size stored in the negative index is to call dealloc. You absolutely can pass size into dealloc while destructing ~[T]. Just use the size, stored in the negative index. I'm not at all concerned about it. I think it would be a huge mistake to use `~[T]` frequently at all, and I'm simply pointing out that this is not going to be a no-op because that claim was made several times. I will be a no-op, if you use null (0) to indicate 0-capacity, and special value(1?) to indicate Option::None. An empty vector is not the same as `None`. Reserving an address is also not possible in all environments Rust is going to be used in as a language, and I think it should be up to the allocator implementation rather than hard-coded knowledge in the compiler. At the moment, the `Some(~())` problem is fixed with no overhead anywhere, and allocators have the choice between a sentinel and clamping zero-size allocations to 1. Can you name one architecture, where we are not able to find a single extra invalid virtual address other than 0? Just to clear, the negative index scheme, will allow free() to take the size argument. b) If that makes implementation of Option complicated, then use the special pointer value to represent a zero capacity. We can use that special value in VecT as well, even though it is not needed. This will keep VecT - ~[T] a no-op. This will add a branch to every deallocation call. No it wouldn't. Vec, doesn't have to check the pointer. Just check the capacity. c) Conversion between VecT - ~[T] is not likely to be common. So, doing an additional check is okay? It's not about there being an additional check. It's about it having to drop excess capacity, which will make conversions to and from `~[T]` hurt. This can easily result in higher time complexity rather than just a constant factor slowdown. I don't think conversion from `VecT` - `~[T]` is important, and I just want to make it clear that there's no way it is going to be free. The cost can not simply be hand-waved away by moving it elsewhere, such as requiring new branches and losing the ability to pass a size to `dealloc`. We negative index scheme does not require you to drop excess capacity. With this scheme, ~[T] and VecT would contain the same amount info. The only difference is that in ~[T], the capacity is stored at a negative index. In VecT, capacity is stored, both inline and at the negative index. The only overhead would be a couple of checks/additions during allocation/deallocation. Everything else would perform exactly as it does now. Manu ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Reminder: ~[T] is not going away
On 04/04/14 04:12 PM, Manu Thambi wrote: On 04/04/2014 02:51 PM, Daniel Micay wrote: Storing at a negative index removes the cost at indexing, but not elsewhere. It still consumes more memory and makes `push` slower, especially since it has to do more than more offset based on alignment with at least one overflow check. In the negative index scheme, the length and capacity in the Vec would be identical to what is it in the current implementation. Hence the code will be identical, except for while allocating/deallocatiing. (ie, push() would have the same performance) It won't have the same performance, because the performance hit comes from the code size increase needed to handle offsetting and overflow checking along with aliasing issues. It was slow because a header involves offsets and overflow checks. It also screws up the alias analysis. The negative index solution suffers from this almost as much as the old vector representation. I feel I've made the reasons why it's slower clear and you simply don't believe what I said. The performance gains from removing the header from vectors weren't imaginary. Even a better implementation than the one in `std::slice` is still slower. It has to check for overflow on any addition like this. The inability to pass a size to `dealloc` is not going to be free either. Teaching LLVM to understand the pointer gymnastics here means trying to make it simpler rather than allowing it to become more complicated. I don't understand what addition you mean? The only time you need the size stored in the negative index is to call dealloc. You absolutely can pass size into dealloc while destructing ~[T]. Just use the size, stored in the negative index. You can pass it for the negative index proposal, but not the other proposals. The negative index proposal involves bloating the `VecT` type to micro-optimize what is going to be an incredibly rare conversion, while the other proposals lose the ability to pass the length. I don't see a valid reason to change the status quo. I'm not at all concerned about it. I think it would be a huge mistake to use `~[T]` frequently at all, and I'm simply pointing out that this is not going to be a no-op because that claim was made several times. I will be a no-op, if you use null (0) to indicate 0-capacity, and special value(1?) to indicate Option::None. You can't use a special value to indicate None without adding a lang item, no other pointer values are specified by Rust or LLVM as being invalid. An empty vector is not the same as `None`. Reserving an address is also not possible in all environments Rust is going to be used in as a language, and I think it should be up to the allocator implementation rather than hard-coded knowledge in the compiler. At the moment, the `Some(~())` problem is fixed with no overhead anywhere, and allocators have the choice between a sentinel and clamping zero-size allocations to 1. Can you name one architecture, where we are not able to find a single extra invalid virtual address other than 0? Whether or not *I* can name such an architecture doesn't matter. Rust is meant to be a portable language, even to platforms this specific contributor is not familiar with. This would add a dependency on global variables for unique pointers, even though you could implement them on in an environment with only a stack using a fixed-size pool. Just to clear, the negative index scheme, will allow free() to take the size argument. I'm talking about all of the proposed solutions such as the ones at the end of your message in isolation from the proposal to require `VecT` to have a header (not going to happen). b) If that makes implementation of Option complicated, then use the special pointer value to represent a zero capacity. We can use that special value in VecT as well, even though it is not needed. This will keep VecT - ~[T] a no-op. This will add a branch to every deallocation call. No it wouldn't. Vec, doesn't have to check the pointer. Just check the capacity. Checking the capacity is a branch. The only overhead would be a couple of checks/additions during allocation/deallocation. Everything else would perform exactly as it does now. It will cause `push` to perform worse than it does now and it will cause `VecT` to allocate more memory. All to micro-optimize a conversion to a nearly useless type. I've made it clear why adding headers to vectors decreases the performance. You clearly don't believe me and I won't be wasting my time on this thread anymore. signature.asc Description: OpenPGP digital signature ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Reminder: ~[T] is not going away
Most of your comments below do not apply to a properly implemented negative index scheme. So, it seems clear to me, that I haven't been able to get it across to you. I guess we can both agree that spending more time on this thread is unproductive, especially since the real question is whether we would *want* to have ~[T] used. Thank you for your time. Manu On 04/04/2014 05:09 PM, Daniel Micay wrote: On 04/04/14 04:12 PM, Manu Thambi wrote: On 04/04/2014 02:51 PM, Daniel Micay wrote: Storing at a negative index removes the cost at indexing, but not elsewhere. It still consumes more memory and makes `push` slower, especially since it has to do more than more offset based on alignment with at least one overflow check. In the negative index scheme, the length and capacity in the Vec would be identical to what is it in the current implementation. Hence the code will be identical, except for while allocating/deallocatiing. (ie, push() would have the same performance) It won't have the same performance, because the performance hit comes from the code size increase needed to handle offsetting and overflow checking along with aliasing issues. It was slow because a header involves offsets and overflow checks. It also screws up the alias analysis. The negative index solution suffers from this almost as much as the old vector representation. I feel I've made the reasons why it's slower clear and you simply don't believe what I said. The performance gains from removing the header from vectors weren't imaginary. Even a better implementation than the one in `std::slice` is still slower. It has to check for overflow on any addition like this. The inability to pass a size to `dealloc` is not going to be free either. Teaching LLVM to understand the pointer gymnastics here means trying to make it simpler rather than allowing it to become more complicated. I don't understand what addition you mean? The only time you need the size stored in the negative index is to call dealloc. You absolutely can pass size into dealloc while destructing ~[T]. Just use the size, stored in the negative index. You can pass it for the negative index proposal, but not the other proposals. The negative index proposal involves bloating the `VecT` type to micro-optimize what is going to be an incredibly rare conversion, while the other proposals lose the ability to pass the length. I don't see a valid reason to change the status quo. I'm not at all concerned about it. I think it would be a huge mistake to use `~[T]` frequently at all, and I'm simply pointing out that this is not going to be a no-op because that claim was made several times. I will be a no-op, if you use null (0) to indicate 0-capacity, and special value(1?) to indicate Option::None. You can't use a special value to indicate None without adding a lang item, no other pointer values are specified by Rust or LLVM as being invalid. An empty vector is not the same as `None`. Reserving an address is also not possible in all environments Rust is going to be used in as a language, and I think it should be up to the allocator implementation rather than hard-coded knowledge in the compiler. At the moment, the `Some(~())` problem is fixed with no overhead anywhere, and allocators have the choice between a sentinel and clamping zero-size allocations to 1. Can you name one architecture, where we are not able to find a single extra invalid virtual address other than 0? Whether or not *I* can name such an architecture doesn't matter. Rust is meant to be a portable language, even to platforms this specific contributor is not familiar with. This would add a dependency on global variables for unique pointers, even though you could implement them on in an environment with only a stack using a fixed-size pool. Just to clear, the negative index scheme, will allow free() to take the size argument. I'm talking about all of the proposed solutions such as the ones at the end of your message in isolation from the proposal to require `VecT` to have a header (not going to happen). b) If that makes implementation of Option complicated, then use the special pointer value to represent a zero capacity. We can use that special value in VecT as well, even though it is not needed. This will keep VecT - ~[T] a no-op. This will add a branch to every deallocation call. No it wouldn't. Vec, doesn't have to check the pointer. Just check the capacity. Checking the capacity is a branch. The only overhead would be a couple of checks/additions during allocation/deallocation. Everything else would perform exactly as it does now. It will cause `push` to perform worse than it does now and it will cause `VecT` to allocate more memory. All to micro-optimize a conversion to a nearly useless type. I've made it clear why adding headers to vectors decreases the performance. You clearly don't believe me and I won't be wasting my time on this thread anymore. -- Manu Thambi Mesh Capital, LLC
Re: [rust-dev] Reminder: ~[T] is not going away
On Wed, Apr 2, 2014 at 9:21 PM, Daniel Micay danielmi...@gmail.com wrote: I used a sentinel value in my fix along with providing a guarantee that `free` is never called on zero-size allocation. That's the end of any no-op `VecT` - `~[T]` conversions since it will need to free a zero size allocation. It's not far from just calling `shrink_to_fit`, and allowing for passing a size to `free`. https://github.com/mozilla/rust/pull/13267 I see the benefit of free knowing the size in this case, although it seems that it would strongly call for type-level integers to avoid needing a special case in the compiler. I don't think this issue necessarily guarantees VecT can't be freely converted to ~[T]. You could hypothetically special case allocations for zero-sized types, while keeping all other allocations real (including zero sized, since the impact would be minimal). You're talking about allocators designed around the limitation of an API. The design no longer needs to make the same compromises if you're going to know the size. The difference between no cache miss and a cache miss is not insignificant... I explained why I think a chunk header is necessary in any case. Maybe it is still a significant win. The C++14 proposal claims Google found one with GCC and tcmalloc, although tcmalloc is rather inefficient to start with... I would like to see numbers. Then again, I agree with the other reasons that using ~[T] is a bad idea, so I have no particular reason to disagree with having the size parameter either. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Reminder: ~[T] is not going away
On 03/04/14 17:15, comex wrote: You're talking about allocators designed around the limitation of an API. The design no longer needs to make the same compromises if you're going to know the size. The difference between no cache miss and a cache miss is not insignificant... I explained why I think a chunk header is necessary in any case. Maybe it is still a significant win. The C++14 proposal claims Google found one with GCC and tcmalloc, although tcmalloc is rather inefficient to start with... I would like to see numbers. Really? I was under the impression that tcmalloc was one of the faster allocators in common use. e.g. two posts I found just now via Google: - https://github.com/blog/1422-tcmalloc-and-mysql - http://www.mysqlperformanceblog.com/2013/03/08/mysql-performance-impact-of-memory-allocators-part-2/ Huon ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Reminder: ~[T] is not going away
On 03/04/14 02:15 AM, comex wrote: On Wed, Apr 2, 2014 at 9:21 PM, Daniel Micay danielmi...@gmail.com wrote: I used a sentinel value in my fix along with providing a guarantee that `free` is never called on zero-size allocation. That's the end of any no-op `VecT` - `~[T]` conversions since it will need to free a zero size allocation. It's not far from just calling `shrink_to_fit`, and allowing for passing a size to `free`. https://github.com/mozilla/rust/pull/13267 I see the benefit of free knowing the size in this case, although it seems that it would strongly call for type-level integers to avoid needing a special case in the compiler. I'm not sure how type-level integers help. This size is often a dynamic one and this doesn't involve any special cases in the compiler. The conversion between VecT and ~[T] is entirely a library feature. I don't think this issue necessarily guarantees VecT can't be freely converted to ~[T]. You could hypothetically special case allocations for zero-sized types, while keeping all other allocations real (including zero sized, since the impact would be minimal). VecT won't be convertible to ~[T] with a no-op after the fix for `Some(~())` lands: https://github.com/mozilla/rust/pull/13267 It will need to free the allocation if it is zero-size. Calling `shrink_to_fit()` isn't far from that and allows passing the length to the free function. Extending the Option-like enum optimization to other types like `RcT` and `VecT` is planned so this issue applies to them too. You're talking about allocators designed around the limitation of an API. The design no longer needs to make the same compromises if you're going to know the size. The difference between no cache miss and a cache miss is not insignificant... I explained why I think a chunk header is necessary in any case. Maybe it is still a significant win. The C++14 proposal claims Google found one with GCC and tcmalloc, although tcmalloc is rather inefficient to start with... I would like to see numbers. Then again, I agree with the other reasons that using ~[T] is a bad idea, so I have no particular reason to disagree with having the size parameter either. signature.asc Description: OpenPGP digital signature ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Reminder: ~[T] is not going away
On Wed, Apr 02, 2014 at 09:21:56PM -0400, Daniel Micay wrote: ...A distinct `~[T]` and `VecT` will make the language more painful to use... This is precisely the matter of debate, isn't it? I personally see two sides to this, which is why I was suggesting that maybe we should wait until we can gain a bit more experience before making a final decision here. Niko ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Reminder: ~[T] is not going away
On 03/04/14 10:22, Niko Matsakis wrote: On Wed, Apr 02, 2014 at 04:03:37PM -0400, Daniel Micay wrote: I have no sane proposal to fix this beyond passing a size to free. I don't believe there is a problem with just not using null to represent such pointers (for example, 1 would suffice). This does impose some additional burdens on slice conversion and the like. This conversation has focused on low-level effects, which is important to understand, but I think the bigger question is: how do we WANT the language to look? Is it useful to have a distinct `VecT` and `~[T]` or -- in our ideal world -- would they be the same? I think we can make the interconversion fast for the default allocator, but we should design for the language we want to use. I could go either way on this. In the kind of programs I write, at least, most vectors get built up to a specific length and then stop growing (frequently they stop changing as well, but not always). Sometimes they continue growing. I actually rather like the idea of using `VecT` as a kind of builder and `~[T]` as the end-product. In those cases where the vector continues to grow, of course, I can just keep the `VecT` around. Following this logic, I would imagine that most APIs want to consume and produce `~[T]`, since they consume and produce end products. I don't think the basic routines returning vectors in libstd etc. are producing end-products; they are fundamental building blocks, and their output will be used in untold ways. (There are not many that consume `~[T]`s by-value.) On the other hand, I could imagine and appreciate an argument that we should just take and produce `VecT`, which gives somewhat more flexibility. In general, Rust takes the philosophy that if you own it, you can mutate it, so why make growing harder than it needs to be? Preferring VecT also means fewer choices, usually a good thing. Perhaps the best thing is to wait a month (or two or three) until DST is more of a reality and then see how we feel. Are you thinking we should also wait before converting the current uses of ~[T] to VecT? Doing the migration gives us the performance[1] and zero-length-zero-alloc benefits, but there were some concerns about additional library churn if we end up converting back to DST's ~[T]. (I'd also guess doing a complete migration now would make the transition slightly easier: no need for staging the libstd changes, and it would allow the current ~[] handling to be removed from libsyntax/librustc completely, leaving a slightly cleaner slate.) Huon [1]: https://github.com/mozilla/rust/issues/8981 ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Reminder: ~[T] is not going away
On 03/04/14 01:22 PM, Ziad Hatahet wrote: Would it be useful to look at what other languages are doing? For instance, slices in Go are appendable, so perhaps it would be worth looking at code bases written in Go to see how they deal with slices, or how often they append to slices returned from standard library routines. -- Ziad Go doesn't have an equivalent to what `~[T]` will be. std::unique_ptrT[] is rarely used in C++, and exists solely for interoperability with legacy code. This is a common use case for std::unique_ptr in C++, which is why it takes a destructor parameter. For example, a lone function returning a FILE * pointer can be dealt with by doing `auto file = make_unique(get_file(), fclose)`, which gives you a `std::unique_ptrFILE, decltype(fclose)`. signature.asc Description: OpenPGP digital signature ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Reminder: ~[T] is not going away
On Thu, Apr 3, 2014 at 11:09 AM, Daniel Micay danielmi...@gmail.com wrote: Go doesn't have an equivalent to what `~[T]` will be. Which was my point. From what I understand, Go's slices are analogous to Rust's VecT in that they are growable. So I was suggesting perusing existing Go code bases to see how often slices returned from standard library routines are appended to; which seems to be one motivator for ~[T]. -- Ziad ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Reminder: ~[T] is not going away
Perhaps the best thing is to wait a month (or two or three) until DST is more of a reality and then see how we feel. Are you thinking we should also wait before converting the current uses of ~[T] to VecT? Doing the migration gives us the performance[1] and zero-length-zero-alloc benefits, but there were some concerns about additional library churn if we end up converting back to DST's ~[T]. I can't speak about how a usage choice affects the standard library, but it seems worth mentioning that vector capacity doesn't have to be in the base object; it can live in the secondary storage, prepended before the elements. A zero-length VecT might be null for the case of zero capacity, or non-null when it has room to grow. For maximally trivial conversion to ~T[], the pointer in VecT would point to the first element, with the capacity at a negative offset. Nathan Myers ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Reminder: ~[T] is not going away
On 02/04/14 11:35 AM, Alex Crichton wrote: I've noticed recently that there seems to be a bit of confusion about the fate of ~[T] with an impending implementation of DST on the horizon. This has been accompanied with a number of pull requests to completely remove many uses of ~[T] throughout the standard distribution. I'd like to take some time to straighten out what's going on with VecT and ~[T]. I think this is a difference of opinion, not confusion. The original pull requests switching `~[T]` to `VecT` were done by pcwalton, and this was with full knowledge of the plans for `~[T]`. # VecT In a post-DST world, VecT will be the vector builder type. It will be the only type for building up a block of contiguous elements. This type exists today, and lives inside of std::vec. Today, you cannot index VecT, but this will be enabled in the future once the indexing traits are fleshed out. It will be Rust's vector (dynamic array) type. I don't think it makes sense to call it a 'builder' any more than it makes sense to call `HashMapK, V` a 'hash table builder'. It makes something simple far more complicated than it needs to be. This type will otherwise largely not change from what it is today. It will continue to occupy three words in memory, and continue to have the same runtime semantics. # ~[T] The type ~[T] will still exist in a post-DST, but its representation will change. Today, a value of type ~[T] is one word (I'll elide the details of this for now). After DST is implemented, ~[T] will be a two-word value of the length and a pointer to an array (similarly to what slices are today). The ~[T] type will continue to have move semantics, and you can borrow it to [T] as usual. The `~[T]` type will exist because `[T]` will exist as a type. It won't be an explicit choice to support having it. Some of us consider it an unfortunate consequence of DST rather than a useful type. The major difference between today's ~[T] type and a post-DST ~[T] is that the push() method will be removed. There is no knowledge of a capacity in the representation of a ~[T] value, so a push could not be supported at all. In theory a pop() can be efficiently supported, but it will likely not be implemented at first. A `pop` or `shift` function is impossible to implement efficiently if allocators require a size to be passed to `free`. # [T] As part of DST, the type grammar will start accepting [T] as a possible substitute for type parameters. This basically means that if your type parameters is T, then [U] can satisfy the type parameter. While possible, I imagine that it will be rare for this to appear in apis. This is an unsized type, which means that it's more limited what you can do with it than you can with a sized type. The full details of [T] will become apparent once DST is implemented, but it's safe to say that APIs and usage should rarely have to deal with this type, and it will likely be mostly transparent. # Converting between VecT and ~[T] Conversions between these two types will be provided, and the default implementations will be free. Converting from VecT to ~[T] will be simply forgetting the capacity, and converting from ~[T] to VecT will set the capacity to the length. Converting from `VecT` to `~[T]` will not be free with an efficient allocation scheme. I don't think Rust will want to be using a legacy `malloc`/`free` style API as the underlying default allocator in the future. I see it only as a temporary measure before a modern allocation model is implemented. Without a size parameter to `free`, an allocator needs to track the size of allocations manually. It increases the memory overhead, along with adding bookkeeping overhead. C++ allocators take a `size` parameter to the `deallocate` function for this reason and I expect Rust will want to do the same. The design of `malloc` and `free` is far from ideal, because the length is either known statically or dynamically in nearly every case. I think leaving out the capacity field of vectors in some cases without dropping the excess capacity is an an insignificant micro-optimization. In contract, passing the length to `free` is quite valuable and will result in a measurable performance win across nearly all Rust code with an allocator taking advantage of it. Helper methods will likely be provided to perform a forceful reallocating shrink when going from VecT to ~[T], but it will not be the default. It has to be the *only* way to do it if Rust is going to be able to switch to an efficient allocation model in the future. The API of `malloc`, `realloc` and `free` is purely a legacy wart and shouldn't drive the design of a new language/library. ## The cost of VecT = ~[T] Some concerns have been brought up that this can in theory be a costly transition under the assumption that this does a reallocation of memory to shrink to the capacity to exactly the length. This will likely not be the
Re: [rust-dev] Reminder: ~[T] is not going away
On 4/2/14 9:25 AM, Daniel Micay wrote: On 02/04/14 11:35 AM, Alex Crichton wrote: I've noticed recently that there seems to be a bit of confusion about the fate of ~[T] with an impending implementation of DST on the horizon. This has been accompanied with a number of pull requests to completely remove many uses of ~[T] throughout the standard distribution. I'd like to take some time to straighten out what's going on with VecT and ~[T]. I think this is a difference of opinion, not confusion. The original pull requests switching `~[T]` to `VecT` were done by pcwalton, and this was with full knowledge of the plans for `~[T]`. It was transitionary. I thought that we would have to fully extract `~[T]` from the language before DST would work, but it now seems likely that that won't need to happen. The `~[T]` type will exist because `[T]` will exist as a type. It won't be an explicit choice to support having it. Some of us consider it an unfortunate consequence of DST rather than a useful type. Even if you buy that `~[T]` is useless (which I'm not sure I do), it's no more unfortunate than the fact that the type system allows useless types like `RcRcRcint` is unfortunate. If `~[T]` remains used throughout the libraries, Rust will become noisier than languages like C++ with a unified vector type. The need to convert between `VecT` and `~[T]` would add noise to lots of code, without any adding measurable optimization win. A micro-optimization shouldn't drive the design of the libraries, especially when it will prevent making a significant *macro*-optimization (passing a length to the deallocation function). In practice C++ libraries use their own custom vector types all over the place, so I wouldn't say that Rust is going to be significantly noisier no matter what we do. Interoperability between different libraries is not a strong point of C++. Besides, C++ has this too, with `unique_ptrT[]`. This Stack Overflow answer is actually pretty illuminating: http://stackoverflow.com/questions/16711697/is-there-any-use-for-unique-ptr-with-array I think that length-frozen owned vectors are likely to be surprisingly common. We'll see. Patrick ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Reminder: ~[T] is not going away
On 02/04/14 02:28 PM, Patrick Walton wrote: On 4/2/14 9:25 AM, Daniel Micay wrote: On 02/04/14 11:35 AM, Alex Crichton wrote: I've noticed recently that there seems to be a bit of confusion about the fate of ~[T] with an impending implementation of DST on the horizon. This has been accompanied with a number of pull requests to completely remove many uses of ~[T] throughout the standard distribution. I'd like to take some time to straighten out what's going on with VecT and ~[T]. I think this is a difference of opinion, not confusion. The original pull requests switching `~[T]` to `VecT` were done by pcwalton, and this was with full knowledge of the plans for `~[T]`. It was transitionary. I thought that we would have to fully extract `~[T]` from the language before DST would work, but it now seems likely that that won't need to happen. The `~[T]` type will exist because `[T]` will exist as a type. It won't be an explicit choice to support having it. Some of us consider it an unfortunate consequence of DST rather than a useful type. Even if you buy that `~[T]` is useless (which I'm not sure I do), it's no more unfortunate than the fact that the type system allows useless types like `RcRcRcint` is unfortunate. No one is proposing that we use `RcRcRcint` in the standard library. Using `~[T]` instead of migrating to `VecT` means there will be conversion noise where there was not going to be conversion noise before. If `~[T]` remains used throughout the libraries, Rust will become noisier than languages like C++ with a unified vector type. The need to convert between `VecT` and `~[T]` would add noise to lots of code, without any adding measurable optimization win. A micro-optimization shouldn't drive the design of the libraries, especially when it will prevent making a significant *macro*-optimization (passing a length to the deallocation function). In practice C++ libraries use their own custom vector types all over the place, so I wouldn't say that Rust is going to be significantly noisier no matter what we do. Interoperability between different libraries is not a strong point of C++. Besides, C++ has this too, with `unique_ptrT[]`. This Stack Overflow answer is actually pretty illuminating: `std::unique_ptr[T]` is useful because lots of legacy code uses the new[]/delete[] memory allocations. Unique pointers also take a custom deleter parameter, because they're usable for managing stuff like files, etc. in C++. I think that length-frozen owned vectors are likely to be surprisingly common. We'll see. They'll certainly be common if the standard library forces many conversions to and from `VecT`... It should not be stated that this conversion is free though, because it only remains free as long as you're using a legacy allocation API like `malloc`. It's also not free in terms of language complexity - people are going to wonder when they should use each one, and I know I'm certainly going to be telling people to use `VecT` almost everywhere. signature.asc Description: OpenPGP digital signature ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Reminder: ~[T] is not going away
On Wed, Apr 2, 2014 at 12:25 PM, Daniel Micay danielmi...@gmail.com wrote: Without a size parameter to `free`, an allocator needs to track the size of allocations manually. It increases the memory overhead, along with adding bookkeeping overhead. Not by very much... If a chunk's header is stored externally, like tcmalloc and Linux slub, there is virtually no memory overhead at the cost of free involving a quick hash table lookup on the address; if it's stored internally, like jemalloc, the overhead is just possibly some page-size-remainder wastage, and free just masks the pointer. Either way, if chunks are ever going to be freed, you need some kind of header to count free slots. I guess knowing the size would help the fast path for free be really simple and even inlined, since it could just swap a fixed thread-local variable. But is that really worth hanging language features on, one way or the other? ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Reminder: ~[T] is not going away
Passing the size to free is currently in a C++14 proposal [1]. It's pretty useful (makes free no slower, might make it faster) and in most code, the size is available on free. I'm not sure it would should be mandatory, but it's definitely useful. [1] http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2013/n3536.html On Wed, Apr 2, 2014 at 3:13 PM, comex com...@gmail.com wrote: On Wed, Apr 2, 2014 at 12:25 PM, Daniel Micay danielmi...@gmail.com wrote: Without a size parameter to `free`, an allocator needs to track the size of allocations manually. It increases the memory overhead, along with adding bookkeeping overhead. Not by very much... If a chunk's header is stored externally, like tcmalloc and Linux slub, there is virtually no memory overhead at the cost of free involving a quick hash table lookup on the address; if it's stored internally, like jemalloc, the overhead is just possibly some page-size-remainder wastage, and free just masks the pointer. Either way, if chunks are ever going to be freed, you need some kind of header to count free slots. I guess knowing the size would help the fast path for free be really simple and even inlined, since it could just swap a fixed thread-local variable. But is that really worth hanging language features on, one way or the other? ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev -- Clark. Key ID : 0x78099922 Fingerprint: B292 493C 51AE F3AB D016 DD04 E5E3 C36F 5534 F907 ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Reminder: ~[T] is not going away
On 02/04/14 03:18 PM, Clark Gaebel wrote: Passing the size to free is currently in a C++14 proposal [1]. It's pretty useful (makes free no slower, might make it faster) and in most code, the size is available on free. I'm not sure it would should be mandatory, but it's definitely useful. [1] http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2013/n3536.html Allocators already do take the size, so it already works for containers, etc. signature.asc Description: OpenPGP digital signature ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Reminder: ~[T] is not going away
On 02/04/14 03:13 PM, comex wrote: On Wed, Apr 2, 2014 at 12:25 PM, Daniel Micay danielmi...@gmail.com wrote: Without a size parameter to `free`, an allocator needs to track the size of allocations manually. It increases the memory overhead, along with adding bookkeeping overhead. Not by very much... If a chunk's header is stored externally, like tcmalloc and Linux slub, there is virtually no memory overhead at the cost of free involving a quick hash table lookup on the address; if it's stored internally, like jemalloc, the overhead is just possibly some page-size-remainder wastage, and free just masks the pointer. Either way, if chunks are ever going to be freed, you need some kind of header to count free slots. You're talking about allocators designed around the limitation of an API. The design no longer needs to make the same compromises if you're going to know the size. The difference between no cache miss and a cache miss is not insignificant... I guess knowing the size would help the fast path for free be really simple and even inlined, since it could just swap a fixed thread-local variable. It's a significant optimization. There's a reason this was included in the C++ allocator design and is being extended to more of the language in C++14. But is that really worth hanging language features on, one way or the other? Is it really worth designing the language around the micro-optimization of leaving off a capacity field? Rust's syntax is verbose enough without needing to convert to and from vector/string builders all the time. signature.asc Description: OpenPGP digital signature ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Reminder: ~[T] is not going away
On 02/04/14 03:13 PM, comex wrote: On Wed, Apr 2, 2014 at 12:25 PM, Daniel Micay danielmi...@gmail.com wrote: But is that really worth hanging language features on, one way or the other? This also isn't the only optimization lost here. Zero-size allocations will need to be clamped to one if passing a size to free isn't required. Why? Rust uses a non-nullable pointer optimization, where Option~T and similar enums can be stored without a tag. This optimization should also be extended to types like slices in the future. It applies to the current `~[T]` but would need to be adapted to a new representation. It's important to avoid allocating for a zero-size allocation, in order to save memory for ~Trait with zero-size types and to avoid allocating in zero-size vectors. However, this means that a zero-size allocation needs to be represented as non-null. Rust needs a way of knowing that despite being non-null, there is no allocated capacity. For example, consider a 0-size slice: (0x22, 0) When this is passed to `free`, Rust needs to be sure that a 0-size slice also has a 0-size capacity. In order to do that, shrink_to_fit() needs to happen during VecT - ~[T] conversions. At the moment, Rust is completely broken in this regard. The following expression evaluates to None: Some(~()) I have no sane proposal to fix this beyond passing a size to free. signature.asc Description: OpenPGP digital signature ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Reminder: ~[T] is not going away
At the moment, Rust is completely broken in this regard. The following expression evaluates to None: Some(~()) Ouch, this is a disaster. Is there a bug filed for this? Anyway, I don't get your argument about size to free having anything to do with fixing it (although I agree that size to free is awesome). If you don't care about equality (i.e. the fact that *~() != *~(), but a == a where a = *~()), just return the address of a single private static 1-byte item for any 0-sized allocation. If you DO care about equality, then you will need at least an integer allocation scheme in all cases on 32-bit platforms, and the real costs are the data structures to track that (at least a bit in a bitmask, probably at least 2 bits for an efficient implementation). If you can't use the 1-2GB of kernel address space, then you'll also need to allocate one byte of actual usable address space (but not committed memory). On 64-bit platforms, you generally have at least around 2^60-2^63 bytes of unusable address space, so you can just increment a pointer pointing there for each allocation, at zero cost. Of course the quick and simple fix is to try to call malloc(0) and if it returns NULL, remember that and switch to using malloc(1) instead. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Reminder: ~[T] is not going away
Clamping `malloc(0)` to `malloc(1)` means that allocations of 0-size types will no longer be free, which is sad. It's very useful to be able to have meet the requirement of having a trait object and avoid any memory allocation if there's no state. The sentinel does work, but adds a branch to *every* free call. It will not optimize out even for cases where the size is fixed at compile time. This isn't a significant issue for the default allocator because it will be complex, but it's a significant issue with a bump/arena allocator, or a simple free list. It's less overhead than not having a size available will be, but why not kill two birds with one stone? signature.asc Description: OpenPGP digital signature ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Reminder: ~[T] is not going away
Personally, I'm strongly against doing using ~[] as return values from library functions. Imagine we were in world were we only had VecT and were adding a new type OwnedSliceT that was (pointer, length) like ~[T]. For how many library functions would we say it is sensible to throw away the capacity information before returning? I don't think anything in libstd etc. would have a strong 'yes' answer to this question. Specifically, I don't see any concrete positives to doing this for library functions other than lets keep using ~[T] and ~[T] [T] having the same in-memory representation (covered below). Under any scheme I can think of, there are negatives: 1. without calling shrink_to_fit in the conversion, we lose the ability to have sized deallocations (covered by others in this thread) 2. if we do call it, then anything returning a ~[T] after building it with a VecT is unavoidably slower 3. either way, you're throwing away (the knowledge of) any extra capacity that was allocated, so if someone wishes to continue extending the slice returned by e.g. `foo`, then `let v = foo().into_vec(); v.push(1)` will always require a realloc. (And for library functions, we shouldn't be dictating how people use the return values.) 4. it adds two vector-like types that someone needs to think about: in the common case the benefits of ~[] (one word smaller) are completely useless, it's really only mostly-immutable heavily-nested data types with a lot of vectors like Rust's AST where it helps[1]. I.e. almost all situations are fine (or better) with a Vec. 5. how will the built-in ~[] type use allocators? (well, I guess this is really how will the built-in ~ type use allocators?, but that question still needs answering[2].) On the representation of ~[T] and [T] being the same: this means that theoretically a ~[T] in covariant(?) position can be coerced to a [T], e.g. Vec~[T] - Vec[T]. However, this only really matters for functions returning many nested slices/vectors, e.g. the same Vec example, because pretty much anything else will be able to write `vec.as_slice()` cheaply. (In the code base, the only things mentioning /~[~[/ now are a few tests and things handling the raw argc/argv, i.e. returning ~[~[u8]].) I don't think this should be a major concern, because I don't see us suddenly growing functions a pile of new functions returning ~[~[T]], and if we do, I would think that they would be better suited to being an iterator (assuming that's possible) over Vec's, and these internal Vec can be then be mapped to ~[T] cheaply before collecting the iterator to a whole new VecVec (or Vec~[]) (assuming a [Vec]/[~[]] is wanted). I'm concerned we are wanting to stick with ~[T] because it's what we currently have, and is familiar; as I said above, I don't see many positives for doing it for library functions. Huon [1]: And even in those cases, it's not a particularly huge gain, e.g. taking *two* words off the old OptVec type by replacing it with a library equivalent to DST's ~[T] only gained about 40MB: http://huonw.github.io/isrustfastyet/mem/#f5357cf,bbf8cdc [2]: The sanest way to support allocators I can think of would be changing `~T` to `UniqT, A=DefaultAlloc`, and then we have `Uniq[T]` which certainly feels less attractive than `~[T]`. On 03/04/14 02:35, Alex Crichton wrote: I've noticed recently that there seems to be a bit of confusion about the fate of ~[T] with an impending implementation of DST on the horizon. This has been accompanied with a number of pull requests to completely remove many uses of ~[T] throughout the standard distribution. I'd like to take some time to straighten out what's going on with VecT and ~[T]. # VecT In a post-DST world, VecT will be the vector builder type. It will be the only type for building up a block of contiguous elements. This type exists today, and lives inside of std::vec. Today, you cannot index VecT, but this will be enabled in the future once the indexing traits are fleshed out. This type will otherwise largely not change from what it is today. It will continue to occupy three words in memory, and continue to have the same runtime semantics. # ~[T] The type ~[T] will still exist in a post-DST, but its representation will change. Today, a value of type ~[T] is one word (I'll elide the details of this for now). After DST is implemented, ~[T] will be a two-word value of the length and a pointer to an array (similarly to what slices are today). The ~[T] type will continue to have move semantics, and you can borrow it to [T] as usual. The major difference between today's ~[T] type and a post-DST ~[T] is that the push() method will be removed. There is no knowledge of a capacity in the representation of a ~[T] value, so a push could not be supported at all. In theory a pop() can be efficiently supported, but it will likely not be implemented at first. # [T] As part of DST, the type grammar will start
Re: [rust-dev] Reminder: ~[T] is not going away
On 4/2/14 2:51 PM, Huon Wilson wrote: Specifically, I don't see any concrete positives to doing this for library functions other than lets keep using ~[T] and ~[T] [T] having the same in-memory representation (covered below). Under any scheme I can think of, there are negatives: 1. without calling shrink_to_fit in the conversion, we lose the ability to have sized deallocations (covered by others in this thread) 2. if we do call it, then anything returning a ~[T] after building it with a VecT is unavoidably slower 3. either way, you're throwing away (the knowledge of) any extra capacity that was allocated, so if someone wishes to continue extending the slice returned by e.g. `foo`, then `let v = foo().into_vec(); v.push(1)` will always require a realloc. (And for library functions, we shouldn't be dictating how people use the return values.) 4. it adds two vector-like types that someone needs to think about: in the common case the benefits of ~[] (one word smaller) are completely useless, it's really only mostly-immutable heavily-nested data types with a lot of vectors like Rust's AST where it helps[1]. I.e. almost all situations are fine (or better) with a Vec. 5. how will the built-in ~[] type use allocators? (well, I guess this is really how will the built-in ~ type use allocators?, but that question still needs answering[2].) On the representation of ~[T] and [T] being the same: this means that theoretically a ~[T] in covariant(?) position can be coerced to a [T], e.g. Vec~[T] - Vec[T]. However, this only really matters for functions returning many nested slices/vectors, e.g. the same Vec example, because pretty much anything else will be able to write `vec.as_slice()` cheaply. (In the code base, the only things mentioning /~[~[/ now are a few tests and things handling the raw argc/argv, i.e. returning ~[~[u8]].) I don't think this should be a major concern, because I don't see us suddenly growing functions a pile of new functions returning ~[~[T]], and if we do, I would think that they would be better suited to being an iterator (assuming that's possible) over Vec's, and these internal Vec can be then be mapped to ~[T] cheaply before collecting the iterator to a whole new VecVec (or Vec~[]) (assuming a [Vec]/[~[]] is wanted). I'm concerned we are wanting to stick with ~[T] because it's what we currently have, and is familiar; as I said above, I don't see many positives for doing it for library functions. What about strings? Should we be using `StrBuf` as well? Patrick ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Reminder: ~[T] is not going away
On 03/04/14 08:54, Patrick Walton wrote: On 4/2/14 2:51 PM, Huon Wilson wrote: Specifically, I don't see any concrete positives to doing this for library functions other than lets keep using ~[T] and ~[T] [T] having the same in-memory representation (covered below). Under any scheme I can think of, there are negatives: 1. without calling shrink_to_fit in the conversion, we lose the ability to have sized deallocations (covered by others in this thread) 2. if we do call it, then anything returning a ~[T] after building it with a VecT is unavoidably slower 3. either way, you're throwing away (the knowledge of) any extra capacity that was allocated, so if someone wishes to continue extending the slice returned by e.g. `foo`, then `let v = foo().into_vec(); v.push(1)` will always require a realloc. (And for library functions, we shouldn't be dictating how people use the return values.) 4. it adds two vector-like types that someone needs to think about: in the common case the benefits of ~[] (one word smaller) are completely useless, it's really only mostly-immutable heavily-nested data types with a lot of vectors like Rust's AST where it helps[1]. I.e. almost all situations are fine (or better) with a Vec. 5. how will the built-in ~[] type use allocators? (well, I guess this is really how will the built-in ~ type use allocators?, but that question still needs answering[2].) On the representation of ~[T] and [T] being the same: this means that theoretically a ~[T] in covariant(?) position can be coerced to a [T], e.g. Vec~[T] - Vec[T]. However, this only really matters for functions returning many nested slices/vectors, e.g. the same Vec example, because pretty much anything else will be able to write `vec.as_slice()` cheaply. (In the code base, the only things mentioning /~[~[/ now are a few tests and things handling the raw argc/argv, i.e. returning ~[~[u8]].) I don't think this should be a major concern, because I don't see us suddenly growing functions a pile of new functions returning ~[~[T]], and if we do, I would think that they would be better suited to being an iterator (assuming that's possible) over Vec's, and these internal Vec can be then be mapped to ~[T] cheaply before collecting the iterator to a whole new VecVec (or Vec~[]) (assuming a [Vec]/[~[]] is wanted). I'm concerned we are wanting to stick with ~[T] because it's what we currently have, and is familiar; as I said above, I don't see many positives for doing it for library functions. What about strings? Should we be using `StrBuf` as well? Patrick ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev I don't see why not. The same arguments apply. Huon ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Reminder: ~[T] is not going away
On Wed, Apr 02, 2014 at 04:03:37PM -0400, Daniel Micay wrote: I have no sane proposal to fix this beyond passing a size to free. I don't believe there is a problem with just not using null to represent such pointers (for example, 1 would suffice). This does impose some additional burdens on slice conversion and the like. This conversation has focused on low-level effects, which is important to understand, but I think the bigger question is: how do we WANT the language to look? Is it useful to have a distinct `VecT` and `~[T]` or -- in our ideal world -- would they be the same? I think we can make the interconversion fast for the default allocator, but we should design for the language we want to use. I could go either way on this. In the kind of programs I write, at least, most vectors get built up to a specific length and then stop growing (frequently they stop changing as well, but not always). Sometimes they continue growing. I actually rather like the idea of using `VecT` as a kind of builder and `~[T]` as the end-product. In those cases where the vector continues to grow, of course, I can just keep the `VecT` around. Following this logic, I would imagine that most APIs want to consume and produce `~[T]`, since they consume and produce end products. On the other hand, I could imagine and appreciate an argument that we should just take and produce `VecT`, which gives somewhat more flexibility. In general, Rust takes the philosophy that if you own it, you can mutate it, so why make growing harder than it needs to be? Preferring VecT also means fewer choices, usually a good thing. Perhaps the best thing is to wait a month (or two or three) until DST is more of a reality and then see how we feel. Niko ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Reminder: ~[T] is not going away
On 02/04/14 07:22 PM, Niko Matsakis wrote: On Wed, Apr 02, 2014 at 04:03:37PM -0400, Daniel Micay wrote: I have no sane proposal to fix this beyond passing a size to free. I don't believe there is a problem with just not using null to represent such pointers (for example, 1 would suffice). This does impose some additional burdens on slice conversion and the like. I used a sentinel value in my fix along with providing a guarantee that `free` is never called on zero-size allocation. That's the end of any no-op `VecT` - `~[T]` conversions since it will need to free a zero size allocation. It's not far from just calling `shrink_to_fit`, and allowing for passing a size to `free`. https://github.com/mozilla/rust/pull/13267 I don't think there's any way around without making `~ZeroSizeType` start allocating memory or losing the `OptionNonNullablePointer` optimization otherwise. This conversation has focused on low-level effects, which is important to understand, but I think the bigger question is: how do we WANT the language to look? Is it useful to have a distinct `VecT` and `~[T]` or -- in our ideal world -- would they be the same? I think we can make the interconversion fast for the default allocator, but we should design for the language we want to use. A distinct `~[T]` and `VecT` will make the language more painful to use, so the only point I'm trying to counter is the performance one because it is *is* a valid micro-optimization in some cases. If our default allocation scheme takes advantage of a known size, then it will be faster. I don't think we should keep using a malloc/realloc/free-style API under the hood in the future. I could go either way on this. In the kind of programs I write, at least, most vectors get built up to a specific length and then stop growing (frequently they stop changing as well, but not always). Sometimes they continue growing. I actually rather like the idea of using `VecT` as a kind of builder and `~[T]` as the end-product. In those cases where the vector continues to grow, of course, I can just keep the `VecT` around. Following this logic, I would imagine that most APIs want to consume and produce `~[T]`, since they consume and produce end products. The language needs to be providing a significant safety/correctness guarantee or performance win in exchange for the extra noise and I don't really think it will be in general. There will be use cases for `~[T]` but I don't think they will be common. If an API consumes `~[T]`, it will lose track of capacity the caller may already be able to provide. If it produces `~[T]`, it will lose track of capacity the caller may want to use later on. On the other hand, I could imagine and appreciate an argument that we should just take and produce `VecT`, which gives somewhat more flexibility. In general, Rust takes the philosophy that if you own it, you can mutate it, so why make growing harder than it needs to be? Preferring VecT also means fewer choices, usually a good thing. Perhaps the best thing is to wait a month (or two or three) until DST is more of a reality and then see how we feel. Niko signature.asc Description: OpenPGP digital signature ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Reminder: ~[T] is not going away
On Apr 2, 2014, at 8:35 AM, Alex Crichton a...@crichton.co wrote: As a concrete example, I'll take the read_to_end() method on io's Reader trait. This type must use a VecT internally to read data into the vector, but it will return a ~[T] because the contents are conceptually frozen after they have been read. This concrete example is great, because it precisely illustrates a major objection I have to returning ~[T]. Reader.read_to_end() internally uses a 64k-byte vector. It reserves 64k bytes, then pushes onto this vector until it hits EOF. Every time it fills up the 64k capacity it reserves another chunk and keeps reading (this, btw, is I think almost certainly unintended behavior and is fixed by #13127, which changes it to always keep 64k of space available for each read rather than potentially requesting smaller and smaller reads). Note that because it uses reserve_at_least() it may actually have more than 64k available. When EOF is reached, this vector is returned to the caller. The problem I have with returning ~[T] here is that both choices for how to deal with this wasted space are terrible: 1. Shrink-to-fit before returning. If I'm going to keep the vector around for a long time this is a good idea, but if I'm just going to process the vector and throw it away, the reallocation was completely unnecessary. 2. Convert to ~[T] without shrinking. The caller has no way to know about the potentially massive amount of wasted space. If I'm going to just process the vector and throw it away that's fine, but if I'm going to keep it around for a while then this is terrible. The only reasonable solution is to return the VecT and let the caller decide if they want to shrink-to-fit or not. -Kevin Ballard ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Reminder: ~[T] is not going away
On Apr 2, 2014, at 3:01 PM, Huon Wilson dbau...@gmail.com wrote: On 03/04/14 08:54, Patrick Walton wrote: What about strings? Should we be using `StrBuf` as well? I don't see why not. The same arguments apply. I agree. I was actually quite surprised to see that the type was named StrBuf, I assumed it was going to be Str just as Vec is not VecBuf. I'm in full agreement with Huon on this matter. The standard libraries should return VecT instead of ~[T] in pretty much every case (the only real exception I can think of is Vec~[T] because of the ability to convert to Vec[T] or [T]] for free). Similarly I think we should be returning StrBuf instead of ~str in all cases. And finally, I think we should just name it Str instead of StrBuf. If developers want to use ~[T] and ~str in their own code, that's fine, but the standard libraries should err on the side of preserving information (e.g. capacity) and providing a consistent experience. If there's one thing I really want to avoid above all else, it's confusing people about whether they should be using ~[T] or VecT, because some standard library code uses one and some code uses the other. -Kevin ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev