klion26 commented on PR #9689: URL: https://github.com/apache/arrow-rs/pull/9689#issuecomment-4450609206
@alamb @scovich After some experiments, seems that the regression comes from compilation settings and CPU architecture, set cargo feature [`-C codegen-units=1`](https://doc.rust-lang.org/rustc/codegen-options/index.html#codegen-units) or [`-C target-cpu=native`](https://doc.rust-lang.org/rustc/codegen-options/index.html#target-cpu) will improve the performance, and using `codegen-units=1` for both main and current branch will have the same performance. - Graviton can't get the `l1i_cache_refill,stall_frontend` stat from perf, verified by using the commands[1] - Run benchmarks with different settings on main and current branch, the command likes `RUSTFLAGS='-C codegen-units=1 -C target-cpu=native' cargo bench --features=arrow,async,test_common,experimental,object_store --bench cast_kernels "cast decimal32 to" -- --save-baseline variant-codege-units-1-native` - main branch with default setting (`main-no-feature` in benchmark group below) - main branch with `-C codegen-units=1`(`main-codegen-units-1` in benchmark group below) - current branch with `-C codegen-units=1`. (`variant-codegen-units-1` in benchmark group below) - current branch with `-C target-cpu=native` (`variant-codegen-native` in benchmark group below, native = `neoverse-n1` for the running machine ) - current branch with `-C coegen-units=1 -C target-cpu=native` (`variant-codegen-units-1-native` in benchmark group below) - current branch with default setting (`variant-no-feature` in benchmark group below) <details><summary>result of different benchmarks</summary> <p> ``` [ec2-user@ip-172-31-35-37 arrow-rs]$ critcmp codegen-main-no-feature codegen-main-codegen-units-1 codegen-variant-codegen-units-1 codegen-variant-codegen-native codegen-variant-codegen-codegen-units-1-native codegen-variant-no-feature group main-codegen-units-1 main-no-feature variant-codegen-units-1-native variant-codegen-native variant-codegen-units-1 variant-no-feature ----- ---------------------------- ----------------------- ---------------------------------------------- ------------------------------ ------------------------------- -------------------------- "cast decimal32 to float32" 1.00 16.2±0.04µs ? ?/sec 1.00 16.2±0.03µs ? ?/sec 1.00 16.2±0.04µs ? ?/sec 1.00 16.2±0.02µs ? ?/sec 1.00 16.2±0.01µs ? ?/sec 1.00 16.3±0.09µs ? ?/sec "cast decimal32 to float64" 1.00 16.4±0.09µs ? ?/sec 1.00 16.4±0.07µs ? ?/sec 1.01 16.6±0.05µs ? ?/sec 1.01 16.6±0.06µs ? ?/sec 1.00 16.4±0.02µs ? ?/sec 1.00 16.4±0.08µs ? ?/sec "cast decimal32 to int16" 1.00 29.3±0.15µs ? ?/sec 1.17 34.3±0.16µs ? ?/sec 1.12 32.8±1.49µs ? ?/sec 1.13 33.1±0.16µs ? ?/sec 1.00 29.3±0.03µs ? ?/sec 1.13 33.2±0.17µs ? ?/sec "cast decimal32 to int32" 1.26 29.3±0.10µs ? ?/sec 1.20 27.9±0.41µs ? ?/sec 1.00 23.3±0.12µs ? ?/sec 1.47 34.3±0.12µs ? ?/sec 1.26 29.3±0.05µs ? ?/sec 1.48 34.6±0.20µs ? ?/sec "cast decimal32 to int64" 1.00 23.6±0.09µs ? ?/sec 1.25 29.4±0.14µs ? ?/sec 1.11 26.1±0.09µs ? ?/sec 1.51 35.7±0.24µs ? ?/sec 1.00 23.6±0.04µs ? ?/sec 1.53 36.0±0.32µs ? ?/sec "cast decimal32 to int8" 1.00 70.0±1.71µs ? ?/sec 1.05 73.4±1.62µs ? ?/sec 1.01 70.6±1.47µs ? ?/sec 1.09 76.5±1.15µs ? ?/sec 1.10 77.2±1.12µs ? ?/sec 1.11 78.0±1.35µs ? ?/sec "cast decimal32 to uint16" 1.00 25.4±0.42µs ? ?/sec 1.50 38.1±0.99µs ? ?/sec 1.00 25.4±0.36µs ? ?/sec 1.30 33.1±0.21µs ? ?/sec 1.00 25.5±0.41µs ? ?/sec 1.31 33.2±0.14µs ? ?/sec "cast decimal32 to uint32" 1.00 26.1±0.05µs ? ?/sec 1.05 27.4±0.72µs ? ?/sec 1.00 26.2±0.09µs ? ?/sec 1.31 34.1±0.13µs ? ?/sec 1.00 26.2±0.03µs ? ?/sec 1.32 34.4±0.19µs ? ?/sec "cast decimal32 to uint64" 1.15 30.2±0.10µs ? ?/sec 1.12 29.5±0.16µs ? ?/sec 1.00 26.3±0.17µs ? ?/sec 1.35 35.5±0.29µs ? ?/sec 1.00 26.3±0.03µs ? ?/sec 1.37 35.9±0.30µs ? ?/sec "cast decimal32 to uint8" 1.00 81.6±1.70µs ? ?/sec 1.05 85.0±1.76µs ? ?/sec 1.00 81.3±1.42µs ? ?/sec 1.06 86.0±1.59µs ? ?/sec 1.09 88.8±1.34µs ? ?/sec 1.10 89.1±1.33µs ? ?/sec cast decimal32 to decimal32 512 1.00 14.0±0.05µs ? ?/sec 1.00 14.1±0.08µs ? ?/sec 1.14 16.0±0.43µs ? ?/sec 1.00 14.0±0.03µs ? ?/sec 1.03 14.5±0.27µs ? ?/sec 1.01 14.3±0.12µs ? ?/sec cast decimal32 to decimal32 512 lower precision 1.00 21.9±0.05µs ? ?/sec 1.01 22.1±0.12µs ? ?/sec 1.08 23.5±0.08µs ? ?/sec 1.00 22.0±0.26µs ? ?/sec 1.04 22.7±0.41µs ? ?/sec 1.08 23.7±0.17µs ? ?/sec cast decimal32 to decimal64 512 1.00 9.9±0.02µs ? ?/sec 1.00 9.9±0.02µs ? ?/sec 1.00 9.8±0.03µs ? ?/sec 1.00 9.8±0.02µs ? ?/sec 1.04 10.3±0.16µs ? ?/sec 1.02 10.0±0.07µs ? ?/sec ``` another benchmark for `decimal32 to int8/uint8` with `-C codegen-units=1` on both main and current branch ``` group codegen-unit1-main codegen-unit1-variant --- cast decimal32 to int8" 1.01 70.9±1.70µs ? ?/sec 1.02 71.8±1.84µs cast decimal32 to uint8" 1.00 81.5±1.74µs ? ?/sec 1.00 81.6±1.54µs ``` </p> </details> The default setting will use `codegen-units = 16` in ec2 and mac by using the command `cargo clean`, `CARGO_INCREMENTAL=0 RUSTFLAGS="-C codegen-units=1 -C save-temps" cargo build -p arrow-cast --release -vv` and `find target/release/deps -name '*cast*rcgu.o' | wc -l` [1] <details> <summary>commands that show ec2 did not have specified perf stat</summary> <p> 1. run some perf command and the state `perf stat -v -e cycles,l1i_cache_refill,stall_frontend:u ./main_cast_kernels-589283a85a93d289 --bench 'cast decimal32 to uint8'` Using CPUID 0x00000000410fd0c0 l1i_cache_refill -> armv8_pmuv3_0/event=0x1/ stall_frontend -> armv8_pmuv3_0/event=0x23/ 2. run `perf stat -vv -e armv8_pmuv3_0/event=0x1/ -- sleep 1` to get the result, the result will be always 0 ``` Performance counter stats for 'sleep 1': 0 armv8_pmuv3_0/event=0x1/u 1.001426993 seconds time elapsed 0.001400000 seconds user 0.000000000 seconds sys Performance counter stats for 'sleep 1': 0 armv8_pmuv3_0/event=0x23/u 1.001548493 seconds time elapsed 0.001492000 seconds user 0.000000000 seconds sys ``` </p> </details> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
