I've heard several times from members of the community (on Matrix and
possibly on answers) that a simple iteration like
const mixed = "\b5Ὂg̀9! ℃ᾭG"
for _, c := range mixed {
... do something with c (but not write to it)
will actually silently allocate a slice of runes and decode the string
into it, before iteration. I've heard it is done to prevent problems
that occur when a programmer might overwrite data being iterated, which
should be a no-brainer for programmers in general, but sure, whatever.
So is it true in the case for constants? Is it true always, or only when
writes occur to the source string or `c` in that case?
And if it always occurs, wouldn't it be great optimization to only
decode runes when we get to them?
Since hearing of this, I started doing all of my utf-8 iteration of
string runes as such:
import "unicode/utf8"
str := "Some UTF-8 string."
var i int
for i < len(str) {
r, size := utf8.DecodeRune(str[i:])
// do something with r...
i += size
}
Which admittedly is a bit of a burden, and possibly premature
optimization if I'm wrong. But I just think it shouldn't silently
allocate a ginormous slice in the background to iterate the runes of a
string I might not read all of, and especially considering `for i, v :=
range` should be idiomatic.
I tried to make a test on Compiler Explorer
<https://godbolt.org/#z:OYLghAFBqd5TKALEBjA9gEwKYFFMCWALugE4A0BIEAZugHZEDKqAhgDbYgCMALOXUYBVAM7YACgA8QAcgAMM8gCse5dq3qhg6cmM6oiBBtWz1MAYXTsArgFt6IAEwA2cqcwAZAvWwA5OwBG2KQgABzkAA7oIsRG9JY29k6uUTGGDF4%2B/rZBIeF62AZxTESspEQJdg4uutj66fQlZUSZfoHBYbql5ZVJNSLdLd5tOR2hAJS66NakqFwyEayoANaswNgA1Las3gCkcgCC%2BwcEtlHlG7uOjjS2RFeOx8c01vSoWzv0EOOXAOwAQscNsCtgRJNhMBsQLsAMwAEUu1wAEnV2OgwGBLrhHqFeLtQjDdrhuLt/rhdgc4QBxB5AkF0UgbAjkDbvaHwjakDTrUHgyG7AF0kH0u4AOnEpG8RHYX2ZrPGQr%2BcKev2Vhxkk3YsgArIoHPJFOhZFT0BsRNNZpsrjDuIoiLIFONJssQNq5GpZLxFLYQLxtaLcS5uDDHHxtW7QtryPqFOQjTJFCIQO77QbJnBYCgMGcCJwKFQINmIrmOsB2IYIgJc0RgkmIAEHYoAt4ygBPWS28jZ2ymIgAeXo7HbBvIOG2mk4jdHBFIhUMADdsEmR9hwahrDWO4opXUp%2BwCAEuaRW5YcFOiJKfTJbZNBMw2JO%2BAIGERRBJpCOVNw1NyQNo1AeSaQJM6ARA0y6JnUc5xCYZi9NUrjuK02S5CAvyRNEsQMPByQYWkcTIe0IToQURQME0PRWFUuGkQ0FFDFkRFoV0zQ4f0gyEaMxGTOaMxzDwmo6nqU7xqwtiYM4vAbMA7wQBWPwQPgxBkIiNospYOZ5qp3A/CadqNk65BINgrA4CE3zkC6boejIXrRiJsiJsm5Cpo6gkyI43pOKEoqOLwziOBGACcfDcI4vxyPwMaGo5LkGeQGYIPAEBZlgeCECQ%2BbUHeLAcFwT53m%2BUhTiojg/pof46LRMEQO4bHoUhwwoR0QV4Vh8RUX06GpO1nGoa11XkYM9W1PUxQcU1TEDcNnXVCRE2MVxIBBTxFr8QsSyrDy2x7IcxynOcRCIjcdy0nthwvG8HzeN8fyAocwq2GCEJQrCCIPCi7BohiWI4niBJEiSZIUtSZ0HMKDJMiybJvZy3KbE9fJ3YqwoAPSoxsACSGxKNYAwbAA7hoR1EMZrLoJpwQbCQGzLPQ6AE9TSCsEd2NiYTkqGJo1OmojEKiijIJ8/yjiAhyAySpohYKg9woikQ4qS9KsrQzL4MggKapHKqxwajZur2SO8Ymmaa1Wo4Ib6WmkzGaZHQWVZ7parZXnarworcNqvwwqEzi/JGvC8CFUbRXGsVJim8WJSlaAU8WeaUNQRYliEZYVlW5a1tQDYjs29BtluXYUz2jADkOU5jtyk4joQs5FIuy6xquhQbvMnY7s7sb7oebanvMsYXqcW63i%2BuWPvwhViMVn5OOVWg6N3QEWaB4Hh1BZEOLVcGzU4UaNYtqHcNwKSYQ0bFRj1DR9R0R%2BuINjQzYk1RRvf9HXyEt8sZRT%2B71/DEjIfY%2Bq0%2BJcEcO5A2odRLiUktJWS8kNiKQyipa0ZUNgaXjlTFBul0BWzckZEyZlqDOldE7T0wkjbh2cq5DU7lPLkB9DCd23AgpBRhN7ZwIZA6hADobWM8ZcF62jigQgNAaCJ1oKPB8%2BUJ4viKh%2BLuSAkzfnYIokRNAiCtgiFwd0pBFGz10SINRGitHJnAeQvhsg4QEFERsMSEkpIyQQfA%2BcIhbHQIcXAggERxgCMMs7OyDDfiinDLwNhbtfjOGDLifyvCYoJl0FQgyxDuByFITIGE5i4m%2BMmIuUgMRjC8CAA%3D>
modifying the string being iterated, and not. I learned that I can't
read the assembly gc produces. But what I gathered is that it takes a
pointer to the original string and actually calls runtime.decoderune on
each iteration. Modifications to mixed change variable mixed (to point
to the newly allocated string(s)), while again, the original string
pointer is kept in stack while iterating. I'd like to have somebody who
can read the assembly confirm.
If that's the case, then it completely voids my argument and concerns.
And will make many people, including myself, very happy to learn that it
is in fact optimal.
In addition, I wonder if it's the same for other types being iterated?
What if a byte slice for example has some bytes modified during
iteration? It must create a copy, but it shouldn't copy for loops that
do not write to the iterated variable.
Sorry for the rant, I'm passionate about it.
Luke
--
You received this message because you are subscribed to the Google Groups
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/golang-nuts/4390361c-288c-4c55-91c9-e9f53ffd74a2%40gmail.com.