cyb70289 opened a new pull request, #13394: URL: https://github.com/apache/arrow/pull/13394
It's a finding when evaluating Arm Statistical Profiling Extension(SPE) on Neoverse-N1 with Arrow CSV writer benchmark. With SPE, we are able to locate precisely the machine code causing heavy cache miss or the exact branch suffering from high misprediction rate. CSV writer benchmark reveals relatively high branch misprediction rate inside glibc `memcpy` function. The branch is comparing the buffer size against 8, and to run different copying code per comparison result. Arrow CSV writer populates one column at once. In the inner loop, it does two `memcpy`. First one is to copy the field value(string) from Arrow string array. Then it does another copy to append the delimiter or end-of-line(if last column), which is at most 2 chars. Data in same column is normally of similar size, but as we are copying a very short delimiter after each data field, looks it makes CPU harder to predict the correct code path in `memcpy` as input data size varies quickly. This PR adds a trivial `copy_endchars` routine to copy only the delimiter and end-of-line. It improves performance significantly on Arm Neoverse-N1. Skylake also sees mild improvement. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
