Thank you Serhat :)

On Monday, October 7, 2019 at 10:04:05 AM UTC+2, Serhat Şevki Dinçer wrote:
>
> Turns out it did not really need uintptr so I switched to unsafe.Pointer. 
> Also added more tests.
>
> Thanks Francis..
>
> On Friday, October 4, 2019 at 12:38:22 PM UTC+3, fra...@adeven.com wrote:
>>
>> Serhat,
>>
>> That implementation looks very tidy. But it still uses uintptr. So it 
>> doesn't solve the GC problems discussed above.
>>
>> On Thursday, September 26, 2019 at 10:59:08 PM UTC+2, Serhat Şevki Dinçer 
>> wrote:
>>>
>>> Hi,
>>>
>>> I wrote a string utility library <https://github.com/jfcg/sixb> with 
>>> many tests to ensure the conversion assumptions are correct. It could be 
>>> helpful.
>>>
>>> Cheers..
>>>
>>> On Wednesday, September 18, 2019 at 12:42:31 PM UTC+3, Francis wrote:
>>>>
>>>> I am looking at the correct way to convert from a byte slice to a 
>>>> string and back with no allocations. All very unsafe.
>>>>
>>>> I think these two cases are fairly symmetrical. So to simplify the 
>>>> discussion below I will only talk about converting from a string to []byte.
>>>>
>>>> func StringToBytes(s string) (b []byte)
>>>>
>>>> From what I have read it is currently not clear how to perform this 
>>>> correctly.
>>>>
>>>> When I say correctly I mean that the function returns a `[]byte` which 
>>>> contains all of and only the bytes in the string and never confuses the 
>>>> garbage collector. We fully expect that the `[]byte` returned will contain 
>>>> the same underlying memory as the string and modifying its contents will 
>>>> modify the string, making the string dangerously mutable. We are 
>>>> comfortable with the dangerously mutable string.
>>>>
>>>> Following the directions in unsafe you _might_ think that this would be 
>>>> a good solution.
>>>>
>>>> func StringToBytes(s string) []byte {
>>>>         return *(*[]byte)(unsafe.Pointer(&reflect.SliceHeader{
>>>>                 Data: 
>>>> (*(*reflect.StringHeader)(unsafe.Pointer(&s))).Data,
>>>>                 Len:  len(s),
>>>>                 Cap:  len(s),
>>>>         }))
>>>> }
>>>>
>>>> The line
>>>>
>>>> Data: (*(*reflect.StringHeader)(unsafe.Pointer(&s))).Data,
>>>>
>>>> here is a really carefully commented example of this approach from 
>>>> github
>>>>
>>>> seems to satisfy unsafe <https://golang.org/pkg/unsafe/> rule 5 about 
>>>> converting uintptr to and from unsafe.Pointer in a singe expression.
>>>>
>>>> However, it clearly violates rule 6 which states `...SliceHeader and 
>>>> StringHeader are only valid when interpreting the content of an actual 
>>>> slice or string value.`. The `[]byte` we are returning here is built from 
>>>> a 
>>>> `&reflect.SliceHeader` and not based on an existing `[]byte`.
>>>>
>>>> So we can switch to
>>>>
>>>> func StringToBytes(s string) (b []byte) {
>>>>         stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s))
>>>>         sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes))
>>>>         sliceHeader.Data = stringHeader.Data
>>>>         sliceHeader.Len = stringHeader.Len
>>>>         sliceHeader.Cap = stringHeader.Len
>>>>         return b
>>>> }
>>>>
>>>> Now we are using an existing []byte to build `sliceHeader` which is 
>>>> good. But we end up with a new problem. sliceHeader.Data and 
>>>> stringHeader.Data are both uintptr. So by creating them in one expression 
>>>> and then writing them in another expression we violate the rule that 
>>>> `uintptr cannot be stored in variable`.
>>>>
>>>> There is a possible sense that we are protected because both of our 
>>>> `uinptr`s are actually real pointers inside a real string and []byte. This 
>>>> seems to be indicated by the line `In this usage hdr.Data is really an 
>>>> alternate way to refer to the underlying pointer in the string header, not 
>>>> a uintptr variable itself.`
>>>>
>>>> This feels very unclear to me.
>>>>
>>>> In particular the code example in the unsafe package
>>>>
>>>> var s string
>>>> hdr := (*reflect.StringHeader)(unsafe.Pointer(&s)) // case 1
>>>> hdr.Data = uintptr(unsafe.Pointer(p))              // case 6 (this case)
>>>> hdr.Len = n
>>>>
>>>>
>>>> is not the same as the case we are dealing with here. Specifically in 
>>>> the unsafe package documentation we are writing from a uintpr stored in a 
>>>> separate variable to another uinptr. They are probably very similar in 
>>>> practice, but it isn't obvious and in my experience subtly incorrect code 
>>>> often comes from relying on vague understandings of important documents.
>>>>
>>>> If we assume that our uinptrs are safe because they are backed by real 
>>>> pointers then there is another issue with our string being garbage 
>>>> collected.
>>>>
>>>> func StringToBytes(s string) (b []byte) {
>>>>         stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s))
>>>>         // Our string is no longer referenced anywhere and could 
>>>> potentially be garbage collected
>>>>         sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes))
>>>>         sliceHeader.Data = stringHeader.Data
>>>>         sliceHeader.Len = stringHeader.Len
>>>>         sliceHeader.Cap = stringHeader.Len
>>>>         return b
>>>> }
>>>>
>>>>
>>>> There is a discussion where this potential problem is raised
>>>>
>>>> https://github.com/golang/go/issues/25484
>>>>
>>>> we also see this issue mentioned in
>>>>
>>>>
>>>> https://groups.google.com/forum/#!msg/golang-nuts/dcjzJy-bSpw/tcZYBzQqAQAJ
>>>>
>>>> The solution of 
>>>>
>>>> func StringToBytes(s string) (b []byte) {
>>>>         stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s))
>>>>         // Our string is no longer referenced anywhere and could 
>>>> potentially be garbage collected
>>>>         sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes))
>>>>         sliceHeader.Data = stringHeader.Data
>>>>         sliceHeader.Len = stringHeader.Len
>>>>         sliceHeader.Cap = stringHeader.Len
>>>>                  runtime.KeepAlive(&s)
>>>>         return b
>>>> }
>>>>
>>>>
>>>> is proposed. This _probably_ works. But a survey of the implementations 
>>>> of unsafe string/[]byte conversions in Go projects that we depend on at 
>>>> work (this operation is very common), didn't show a single example of 
>>>> anyone using the KeepAlive trick.
>>>>
>>>> In particular the person who initiated the conversation in golang-nuts 
>>>> where the KeepAlive was suggested has implemented this conversion without 
>>>> it
>>>>
>>>> https://github.com/cespare/xxhash/blob/v2.1.0/xxhash_unsafe.go#L28
>>>>
>>>> A workaround of writing 
>>>>
>>>> func StringToBytes(s string) (b []byte) {
>>>>         stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s))
>>>>         sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes))
>>>>         sliceHeader.Data = stringHeader.Data
>>>>         sliceHeader.Len = len(s)
>>>>         sliceHeader.Cap = len(s)
>>>>         // Maybe we managed to hold onto s until here?
>>>>         return b
>>>> }
>>>>
>>>>
>>>> was proposed. I think the reasoning here is that the references to 
>>>> `len(s)` keep the string alive. I am not totally convinced because I think 
>>>> the compiler is free to write this as 
>>>>
>>>> func StringToBytes(s string) (b []byte) {
>>>>         stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s))
>>>>         sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes))
>>>>         sliceHeader.Len = len(s)
>>>>         sliceHeader.Cap = len(s)
>>>>         // Compiler has reordered our code, and s might be garbage 
>>>> collected
>>>>         sliceHeader.Data = stringHeader.Data
>>>>         return b
>>>> }
>>>>
>>>> but, maybe this modification can never happen.
>>>>
>>>> At this point I don't think we have any clear answers about how to 
>>>> write this code correctly.
>>>>
>>>> If we look inside the Go codebase we see a few interesting approaches
>>>>
>>>> going from []byte to string we can just
>>>>
>>>> return *(*string)(unsafe.Pointer(&b))
>>>>
>>>>
>>>> This approach is used in 
>>>>
>>>> https://golang.org/src/strings/builder.go#L45
>>>> https://github.com/golang/go/blob/master/src/runtime/string.go#L152
>>>>
>>>> So we can see that the Go std-lib isn't using Slice/StringHeader to 
>>>> perform these conversions, which seems like a shame.
>>>>
>>>> Looking at how this is done on github reveals a variety of different 
>>>> approaches and discussion on the topic don't seem to ever be conclusive.
>>>>
>>>>
>>>>
>>>> In general it seems that 
>>>>
>>>> func StringToBytes(s string) (b []byte) {
>>>>         stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s))
>>>>         sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes))
>>>>         sliceHeader.Data = stringHeader.Data
>>>>         sliceHeader.Len = len(s)
>>>>         sliceHeader.Cap = len(s)
>>>>         return b
>>>> }
>>>>
>>>>
>>>> is probably pretty good, and here is a very carefully commented example 
>>>> of it from github.
>>>>
>>>> https://github.com/m3db/m3x/blob/master/unsafe/string.go#L62
>>>>
>>>> Is there an authoritative correct way to do this?
>>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/d61622df-8a4f-45ed-90f4-4b1c0700c079%40googlegroups.com.

Reply via email to