daniel-adam-tfs commented on PR #654: URL: https://github.com/apache/arrow-go/pull/654#issuecomment-3836024225
@zeroshade One of our departments has integrated byte-stream-split encoding/decoding into the currently used proprietary format that is used to store data. We did some comparisons and they were getting faster decoding so looked into their code and they were using SIMD implementation with `VPUNPCKLBW` in C#. I took their fallback and SIMD implementations and fed them to Claude (haven't seen much of assembler since college myself) and it gave me these implementation which are really fast. The fallback is pretty fast, faster than the current implementation so I'd replace the current implementation with it. And I've copied the file names and build tags for the existing assembly in the repo, so it should be OK to add them. I've actually wrote a C code first and tried c2goasm with AppleClang and clang21.0 and I couldn't get that to generate me a compilable code. And I've also tried the new https://pkg.go.dev/simd/archsimd package that we're getting in go1.26, but it doesn't have VPUNPCKLBW wrapper there, so I couldn't get it to be as fast as the Claude generated code. Anyway, most of our data is float32s or float64s, so the bss decoding function was at the top. After trying processing some files with this change it fell to like to 10th place, `memmove` and `LevelDecoder.Decode` are the top 2 now. I think I can do something with both. (I see potential improvements to the level decoding for our case and I should get rid of some memmoves if I at some point figure out the other PR with the buffers. 😆 ) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
