[GitHub] [arrow] cyb70289 commented on pull request #11828: ARROW-14940: [C++] Speed up CSV parser with long CSV cells

GitBox Fri, 03 Dec 2021 03:18:03 -0800


cyb70289 commented on pull request #11828:
URL: https://github.com/apache/arrow/pull/11828#issuecomment-985436637



   It's a bit annoying that I find this PR actually causes big regression on 
Apple M1.
   
   I was testing on M1 baremetal with AppleClang (cmake prints "The CXX 
compiler identification is AppleClang 12.0.0.12000032")
   
   Today I setup a ubuntu-20.04 docker on M1 and tested inside the docker with 
ubuntu shipped clang-12. Big regression is observed.
   ```
   -----------------------------------------------------------------
   Regressions: (5)
   -----------------------------------------------------------------
                 benchmark        baseline       contender  change %
     ParseCSVStocksExample   1.454 GiB/sec   1.108 GiB/sec   -23.794
   ParseCSVVehiclesExample   1.569 GiB/sec   1.058 GiB/sec   -32.539
    ParseCSVFlightsExample 791.810 MiB/sec 520.000 MiB/sec   -34.328
       ParseCSVQuotedBlock 931.009 MiB/sec 573.038 MiB/sec   -38.450
      ParseCSVEscapedBlock 931.376 MiB/sec 533.834 MiB/sec   -42.683
   ```
   
   The baseline performance has big gaps. clang built binary is much faster 
than AppleClang, and the clang result is more reasonable IMO.
   
   Not familiar with MacOS. I guess there might be some additional optimization 
options to tune for AppleClang except -O3, as I find AppleClang compiling speed 
is much faster than clang.
   
   Given this PR brings no big difference on Arm Neoverse N1, I think we can 
disable the bloom filter approach on Arm.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] cyb70289 commented on pull request #11828: ARROW-14940: [C++] Speed up CSV parser with long CSV cells

Reply via email to