daniel-adam-tfs commented on PR #654:
URL: https://github.com/apache/arrow-go/pull/654#issuecomment-3836024225

   @zeroshade One of our departments has integrated byte-stream-split 
encoding/decoding into the currently used proprietary format that is used to 
store data. We did some comparisons and they were getting faster decoding so 
looked into their code and they were using SIMD implementation with 
`VPUNPCKLBW` in C#. I took their fallback and SIMD implementations and fed them 
to Claude (haven't seen much of assembler since college myself) and it gave me 
these implementation which are really fast.
   The fallback is pretty fast, faster than the current implementation so I'd 
replace the current implementation with it. And I've copied the file names and 
build tags for the existing assembly in the repo, so it should be OK to add 
them. 
   
   I've actually wrote a C code first and tried c2goasm with AppleClang and 
clang21.0 and I couldn't get that to generate me a compilable code. 
   And I've also tried the new https://pkg.go.dev/simd/archsimd package that 
we're getting in go1.26, but it doesn't have VPUNPCKLBW wrapper there, so I 
couldn't get it to be as fast as the Claude generated code.
   
   
   Anyway, most of our data is float32s or float64s, so the bss decoding 
function was at the top. After trying processing some files with this change it 
fell to like to 10th place, `memmove` and `LevelDecoder.Decode` are the top 2 
now. I think I can do something with both. (I see potential improvements to the 
level decoding for our case and I should get rid of some memmoves if I at some 
point figure out the other PR with the buffers. 😆 ) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to