mapleFU commented on issue #6034: URL: https://github.com/apache/arrow-rs/issues/6034#issuecomment-2248170958
https://godbolt.org/z/czePsTofc : `makeInline2` would try to generate lots of branching by switch, and if all string would likely to keep the same length, the code would like memset & memcpy for previous version: ```asm call memset@PLT mov rdi, r14 mov rsi, rbx mov rdx, r15 call memcpy@PLT ``` on the otherside, makeInline2 would generate different function depending on bit-width. For example: ```asm .LBB1_2: movzx esi, byte ptr [rdi] xor edx, edx shl rsi, 32 or rax, rsi ret .LBB1_3: movzx esi, word ptr [rdi] xor edx, edx shl rsi, 32 or rax, rsi ret ``` If branch prediction works well, same function would be called again and again. > some string length are fast, e.g., 4 and 8, becuase they align with register width. AFAIK, if same instr is used, and no cross cache-line memory visiting, memory copying instr would not suffer from unaligned access? ( Maybe I'm wrong ) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
