Re: [I] Improve performance of constructing `ByteView`s for small strings [arrow-rs]

via GitHub Wed, 24 Jul 2024 07:36:53 -0700


mapleFU commented on issue #6034:
URL: https://github.com/apache/arrow-rs/issues/6034#issuecomment-2248170958


   https://godbolt.org/z/czePsTofc : `makeInline2` would try to generate lots 
of branching by switch, and if all string would likely to keep the same length, 
the code would like memset & memcpy for previous version:
   
   ```asm
           call    memset@PLT
           mov     rdi, r14
           mov     rsi, rbx
           mov     rdx, r15
           call    memcpy@PLT
   ```
   
   on the otherside, makeInline2 would generate different function depending on 
bit-width. For example:
   
   ```asm
   .LBB1_2:
           movzx   esi, byte ptr [rdi]
           xor     edx, edx
           shl     rsi, 32
           or      rax, rsi
           ret
   .LBB1_3:
           movzx   esi, word ptr [rdi]
           xor     edx, edx
           shl     rsi, 32
           or      rax, rsi
           ret
   ```
   
   If branch prediction works well, same function would be called again and 
again.
   
   > some string length are fast, e.g., 4 and 8, becuase they align with 
register width.
   
   AFAIK, if same instr is used, and no cross cache-line memory visiting, 
memory copying instr would not suffer from unaligned access? ( Maybe I'm wrong )


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Improve performance of constructing `ByteView`s for small strings [arrow-rs]

Reply via email to