Serhat,

That implementation looks very tidy. But it still uses uintptr. So it 
doesn't solve the GC problems discussed above.

On Thursday, September 26, 2019 at 10:59:08 PM UTC+2, Serhat Şevki Dinçer 
wrote:
>
> Hi,
>
> I wrote a string utility library <https://github.com/jfcg/sixb> with many 
> tests to ensure the conversion assumptions are correct. It could be helpful.
>
> Cheers..
>
> On Wednesday, September 18, 2019 at 12:42:31 PM UTC+3, Francis wrote:
>>
>> I am looking at the correct way to convert from a byte slice to a string 
>> and back with no allocations. All very unsafe.
>>
>> I think these two cases are fairly symmetrical. So to simplify the 
>> discussion below I will only talk about converting from a string to []byte.
>>
>> func StringToBytes(s string) (b []byte)
>>
>> From what I have read it is currently not clear how to perform this 
>> correctly.
>>
>> When I say correctly I mean that the function returns a `[]byte` which 
>> contains all of and only the bytes in the string and never confuses the 
>> garbage collector. We fully expect that the `[]byte` returned will contain 
>> the same underlying memory as the string and modifying its contents will 
>> modify the string, making the string dangerously mutable. We are 
>> comfortable with the dangerously mutable string.
>>
>> Following the directions in unsafe you _might_ think that this would be a 
>> good solution.
>>
>> func StringToBytes(s string) []byte {
>>         return *(*[]byte)(unsafe.Pointer(&reflect.SliceHeader{
>>                 Data: (*(*reflect.StringHeader)(unsafe.Pointer(&s))).Data,
>>                 Len:  len(s),
>>                 Cap:  len(s),
>>         }))
>> }
>>
>> The line
>>
>> Data: (*(*reflect.StringHeader)(unsafe.Pointer(&s))).Data,
>>
>> here is a really carefully commented example of this approach from github
>>
>> seems to satisfy unsafe <https://golang.org/pkg/unsafe/> rule 5 about 
>> converting uintptr to and from unsafe.Pointer in a singe expression.
>>
>> However, it clearly violates rule 6 which states `...SliceHeader and 
>> StringHeader are only valid when interpreting the content of an actual 
>> slice or string value.`. The `[]byte` we are returning here is built from a 
>> `&reflect.SliceHeader` and not based on an existing `[]byte`.
>>
>> So we can switch to
>>
>> func StringToBytes(s string) (b []byte) {
>>         stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s))
>>         sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes))
>>         sliceHeader.Data = stringHeader.Data
>>         sliceHeader.Len = stringHeader.Len
>>         sliceHeader.Cap = stringHeader.Len
>>         return b
>> }
>>
>> Now we are using an existing []byte to build `sliceHeader` which is good. 
>> But we end up with a new problem. sliceHeader.Data and stringHeader.Data 
>> are both uintptr. So by creating them in one expression and then writing 
>> them in another expression we violate the rule that `uintptr cannot be 
>> stored in variable`.
>>
>> There is a possible sense that we are protected because both of our 
>> `uinptr`s are actually real pointers inside a real string and []byte. This 
>> seems to be indicated by the line `In this usage hdr.Data is really an 
>> alternate way to refer to the underlying pointer in the string header, not 
>> a uintptr variable itself.`
>>
>> This feels very unclear to me.
>>
>> In particular the code example in the unsafe package
>>
>> var s string
>> hdr := (*reflect.StringHeader)(unsafe.Pointer(&s)) // case 1
>> hdr.Data = uintptr(unsafe.Pointer(p))              // case 6 (this case)
>> hdr.Len = n
>>
>>
>> is not the same as the case we are dealing with here. Specifically in the 
>> unsafe package documentation we are writing from a uintpr stored in a 
>> separate variable to another uinptr. They are probably very similar in 
>> practice, but it isn't obvious and in my experience subtly incorrect code 
>> often comes from relying on vague understandings of important documents.
>>
>> If we assume that our uinptrs are safe because they are backed by real 
>> pointers then there is another issue with our string being garbage 
>> collected.
>>
>> func StringToBytes(s string) (b []byte) {
>>         stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s))
>>         // Our string is no longer referenced anywhere and could 
>> potentially be garbage collected
>>         sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes))
>>         sliceHeader.Data = stringHeader.Data
>>         sliceHeader.Len = stringHeader.Len
>>         sliceHeader.Cap = stringHeader.Len
>>         return b
>> }
>>
>>
>> There is a discussion where this potential problem is raised
>>
>> https://github.com/golang/go/issues/25484
>>
>> we also see this issue mentioned in
>>
>> https://groups.google.com/forum/#!msg/golang-nuts/dcjzJy-bSpw/tcZYBzQqAQAJ
>>
>> The solution of 
>>
>> func StringToBytes(s string) (b []byte) {
>>         stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s))
>>         // Our string is no longer referenced anywhere and could 
>> potentially be garbage collected
>>         sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes))
>>         sliceHeader.Data = stringHeader.Data
>>         sliceHeader.Len = stringHeader.Len
>>         sliceHeader.Cap = stringHeader.Len
>>                  runtime.KeepAlive(&s)
>>         return b
>> }
>>
>>
>> is proposed. This _probably_ works. But a survey of the implementations 
>> of unsafe string/[]byte conversions in Go projects that we depend on at 
>> work (this operation is very common), didn't show a single example of 
>> anyone using the KeepAlive trick.
>>
>> In particular the person who initiated the conversation in golang-nuts 
>> where the KeepAlive was suggested has implemented this conversion without it
>>
>> https://github.com/cespare/xxhash/blob/v2.1.0/xxhash_unsafe.go#L28
>>
>> A workaround of writing 
>>
>> func StringToBytes(s string) (b []byte) {
>>         stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s))
>>         sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes))
>>         sliceHeader.Data = stringHeader.Data
>>         sliceHeader.Len = len(s)
>>         sliceHeader.Cap = len(s)
>>         // Maybe we managed to hold onto s until here?
>>         return b
>> }
>>
>>
>> was proposed. I think the reasoning here is that the references to 
>> `len(s)` keep the string alive. I am not totally convinced because I think 
>> the compiler is free to write this as 
>>
>> func StringToBytes(s string) (b []byte) {
>>         stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s))
>>         sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes))
>>         sliceHeader.Len = len(s)
>>         sliceHeader.Cap = len(s)
>>         // Compiler has reordered our code, and s might be garbage 
>> collected
>>         sliceHeader.Data = stringHeader.Data
>>         return b
>> }
>>
>> but, maybe this modification can never happen.
>>
>> At this point I don't think we have any clear answers about how to write 
>> this code correctly.
>>
>> If we look inside the Go codebase we see a few interesting approaches
>>
>> going from []byte to string we can just
>>
>> return *(*string)(unsafe.Pointer(&b))
>>
>>
>> This approach is used in 
>>
>> https://golang.org/src/strings/builder.go#L45
>> https://github.com/golang/go/blob/master/src/runtime/string.go#L152
>>
>> So we can see that the Go std-lib isn't using Slice/StringHeader to 
>> perform these conversions, which seems like a shame.
>>
>> Looking at how this is done on github reveals a variety of different 
>> approaches and discussion on the topic don't seem to ever be conclusive.
>>
>>
>>
>> In general it seems that 
>>
>> func StringToBytes(s string) (b []byte) {
>>         stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s))
>>         sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes))
>>         sliceHeader.Data = stringHeader.Data
>>         sliceHeader.Len = len(s)
>>         sliceHeader.Cap = len(s)
>>         return b
>> }
>>
>>
>> is probably pretty good, and here is a very carefully commented example 
>> of it from github.
>>
>> https://github.com/m3db/m3x/blob/master/unsafe/string.go#L62
>>
>> Is there an authoritative correct way to do this?
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/21938acb-e242-47a0-8742-2e8b5f9ee760%40googlegroups.com.

Reply via email to