cyb70289 commented on pull request #11458: URL: https://github.com/apache/arrow/pull/11458#issuecomment-946408775
A bit surprised that gcc is much slower (~0.5x) than clang in Add256 and Multiply256 tests on xeon gold 5218. No obvious difference is observed on arm64 neoverse n1 between gcc and clang. clang-10, xeon gold 5128 ``` ---------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... ---------------------------------------------------------------------------------- FromString 353 ns 353 ns 1985996 items_per_second=17.011M/s ToString 432 ns 432 ns 1621880 items_per_second=13.8972M/s BinaryMathOpAdd128 23.1 ns 23.1 ns 30187024 items_per_second=432.948M/s BinaryMathOpMultiply128 38.1 ns 38.1 ns 18345407 items_per_second=262.42M/s BinaryMathOpDivide128 471 ns 471 ns 1499617 items_per_second=21.2288M/s BinaryMathOpAdd256 36.3 ns 36.3 ns 19154265 items_per_second=275.707M/s BinaryMathOpMultiply256 106 ns 106 ns 6580549 items_per_second=93.995M/s BinaryMathOpDivide256 826 ns 826 ns 839848 items_per_second=12.1013M/s BinaryMathOpAggregate 223 ns 223 ns 3138443 items_per_second=44.8313M/s BinaryCompareOp 34.7 ns 34.7 ns 20177291 items_per_second=288.247M/s BinaryCompareOpConstant 31.9 ns 31.9 ns 21965275 items_per_second=313.771M/s UnaryOp 24.7 ns 24.7 ns 28309678 items_per_second=404.377M/s Constants 8.18 ns 8.18 ns 85797082 items_per_second=244.593M/s BinaryBitOp 23.0 ns 23.0 ns 30372604 items_per_second=434.095M/s ``` gcc-9.3, xeon gold 5218 ``` ---------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... ---------------------------------------------------------------------------------- FromString 306 ns 306 ns 2287193 items_per_second=19.6159M/s ToString 366 ns 366 ns 1911365 items_per_second=16.38M/s BinaryMathOpAdd128 25.8 ns 25.8 ns 27128130 items_per_second=387.526M/s BinaryMathOpMultiply128 50.9 ns 50.9 ns 13754093 items_per_second=196.502M/s BinaryMathOpDivide128 534 ns 534 ns 1311098 items_per_second=18.7322M/s BinaryMathOpAdd256 89.9 ns 89.9 ns 7778370 items_per_second=111.177M/s BinaryMathOpMultiply256 237 ns 237 ns 2959045 items_per_second=42.2758M/s BinaryMathOpDivide256 833 ns 833 ns 840766 items_per_second=12.0099M/s BinaryMathOpAggregate 259 ns 259 ns 2707564 items_per_second=38.6741M/s BinaryCompareOp 32.2 ns 32.2 ns 21941410 items_per_second=310.521M/s BinaryCompareOpConstant 28.5 ns 28.5 ns 24753989 items_per_second=350.707M/s UnaryOp 25.8 ns 25.8 ns 27129232 items_per_second=387.547M/s Constants 7.88 ns 7.88 ns 88784342 items_per_second=253.659M/s BinaryBitOp 26.9 ns 26.9 ns 26023211 items_per_second=371.79M/s ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
