cyb70289 opened a new pull request, #13394:
URL: https://github.com/apache/arrow/pull/13394

   It's a finding when evaluating Arm Statistical Profiling Extension(SPE)
   on Neoverse-N1 with Arrow CSV writer benchmark. With SPE, we are able to
   locate precisely the machine code causing heavy cache miss or the exact
   branch suffering from high misprediction rate.
   
   CSV writer benchmark reveals relatively high branch misprediction rate
   inside glibc `memcpy` function. The branch is comparing the buffer size
   against 8, and to run different copying code per comparison result.
   
   Arrow CSV writer populates one column at once. In the inner loop, it
   does two `memcpy`. First one is to copy the field value(string) from
   Arrow string array. Then it does another copy to append the delimiter
   or end-of-line(if last column), which is at most 2 chars. Data in same
   column is normally of similar size, but as we are copying a very short
   delimiter after each data field, looks it makes CPU harder to predict
   the correct code path in `memcpy` as input data size varies quickly.
   
   This PR adds a trivial `copy_endchars` routine to copy only the
   delimiter and end-of-line. It improves performance significantly on
   Arm Neoverse-N1. Skylake also sees mild improvement.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to