Hi, On Thu, Oct 26, 2017 at 02:43:02PM +0200, Richard Biener wrote: > On Thu, Oct 26, 2017 at 2:18 PM, Martin Jambor <mjam...@suse.cz> wrote: > > > > Nevertheless, I still intend to experiment with the limit, I sent out > > this RFC exactly so that I don't spend a lot of time benchmarking > > something that is eventually not deemed acceptable on principle. > > I think the limit should be on the number of generated copies and not > the overall size of the structure... If the struct were composed of > 32 individual chars we wouldn't want to emit 32 loads and 32 stores...
I have added another parameter to also limit the number of generated element copies. I have kept the size limit so that we don't even attempt to count them for large structures. > Given that load bandwith is usually higher than store bandwith it > might make sense to do the store combining in our copying sequence, > like for the 8 byte entry case use sth like > > movq 0(%eax), %xmm0 > movhps 8(%eax), %xmm0 // or vpinsert > mov[au]ps %xmm0, 0%(ebx) I would be concerned about the cost of GPR->XMM moves when the value being stored is in a GPR, especially with generic tuning which (with -O2) is the main thing I am targeting here. Wouldn't we actually pass it through stack with all the associated penalties? Also, while such store combining might work for ImageMagick, if a programmer did: region1->x = x1; region2->x = x2; region1->y = 0; region2->y = 20; ... SetPixelCacheNexusPixels(cache_info, ReadMode, region1, ...) The transformation would not work unless it could prove region1 and region2 are not the same thing. > As said a general concern was you not copying padding. If you > put this into an even more common place you surely will break > stuff, no? I don't understand, what even more common place do you mean? I have been testing the patch also on a bunch of other architectures and those have tests in their testsuite that check that padding is copied, for example some tests in gcc.target/aarch64/aapcs64/ check whether a structure passed to a function is binary the same as the original, and the test fail because of padding. That is the only "breakage" I know about but I believe that the assumption that padding must always be is wrong (if it is not than we need to make SRA quite a bit more conservative). On Thu, Oct 26, 2017 at 05:09:42PM +0200, Richard Biener wrote: > Also if we do the stores in smaller chunks we are more > likely hitting the same store-to-load-forwarding issue > elsewhere. Like in case the destination is memcpy'ed > away. > > So the proposed change isn't necessarily a win without > a possible similar regression that it tries to fix. > With some encouragement by Honza, I have done some benchmarking anyway and I did not see anything of that kind. > Whole-program analysis of accesses might allow > marking affected objects. Attempting to save access patterns before IPA and then tracking them and keep them in sync across inlining and all gimple late passes seems like a nightmarish task. If this approach is indeed rejected I might attempt to do the store combining but a WPA analysis seems just too complex. Anyway, here are the numbers. They were taken on two different Zen-based machines. I am also in the process of measuring at least something on a Haswell machine but I started later and the machine is quite a bit slower so I will not have the numbers until next week (and not all equivalents in any way). I found out I do not have access to any more modern .*Lake intel CPU. trunk is pristine trunk revision 254205. All benchmarks were run three times and the median was chosen. s or strict means the patch with the strictest possible settings to speed-up ImageMagick, i.e. --param max-size-for-elementwise-copy=32 --param max-insns-for-elementwise-copy=4. Also run three times. x1 is patched trunk with the parameters having the default values was going to propose, i.e. --param max-size-for-elementwise-copy=35 --param max-insns-for-elementwise-copy=6. Also run three times. I then increased the parameter, in search for further missed opportunities and to see what and how soon will start to regress. x2 is roughly twice that, --param max-size-for-elementwise-copy=67 --param max-insns-for-elementwise-copy=12. Run twice, outliers manually checked. x4 is roughly four times x1, namely --param max-size-for-elementwise-copy=143 --param max-insns-for-elementwise-copy=24. Run only once. The times below are of course "non-reportable," for a whole bunch of reasons. Zen SPECINT 2006 -O2 generic tuning ==================================== Run-time -------- | Benchmark | trunk | s | % | x1 | % | x2 | % | x4 | % | |----------------+-------+-----+-------+-----+-------+-----+-------+-----+-------| | 400.perlbench | 237 | 236 | -0.42 | 236 | -0.42 | 238 | +0.42 | 237 | +0.00 | | 401.bzip2 | 341 | 342 | +0.29 | 341 | +0.00 | 341 | +0.00 | 341 | +0.00 | | 403.gcc | 217 | 217 | +0.00 | 217 | +0.00 | 216 | -0.46 | 217 | +0.00 | | 429.mcf | 224 | 218 | -2.68 | 223 | -0.45 | 221 | -1.34 | 226 | +0.89 | | 445.gobmk | 361 | 361 | +0.00 | 361 | +0.00 | 360 | -0.28 | 363 | +0.55 | | 456.hmmer | 296 | 296 | +0.00 | 296 | +0.00 | 297 | +0.34 | 296 | +0.00 | | 458.sjeng | 453 | 452 | -0.22 | 454 | +0.22 | 454 | +0.22 | 460 | +1.55 | | 462.libquantum | 289 | 289 | +0.00 | 291 | +0.69 | 289 | +0.00 | 291 | +0.69 | | 464.h264ref | 391 | 391 | +0.00 | 385 | -1.53 | 385 | -1.53 | 385 | -1.53 | | 471.omnetpp | 269 | 255 | -5.20 | 250 | -7.06 | 247 | -8.18 | 268 | -0.37 | | 473.astar | 320 | 321 | +0.31 | 317 | -0.94 | 320 | +0.00 | 320 | +0.00 | | 483.xalancbmk | 187 | 188 | +0.53 | 188 | +0.53 | 187 | +0.00 | 187 | +0.00 | Although the omnetpp looks like a sizeable improvement I should warn that this is one of the few slightly jumpy benchmarks. However, I re-run it a few more times and it seems like it is jumping around a lower value when compiled with the patched compiler. It might not be the 5-8% though. Text size --------- | Benchmark | trunk | struict | % | x1 | % | x2 | % | x4 | % | |----------------+---------+---------+-------+---------+-------+---------+-------+---------+-------| | 400.perlbench | 875874 | 875954 | +0.01 | 875954 | +0.01 | 876018 | +0.02 | 876146 | +0.03 | | 401.bzip2 | 44754 | 44754 | +0.00 | 44754 | +0.00 | 44754 | +0.00 | 44754 | +0.00 | | 403.gcc | 2294466 | 2294930 | +0.02 | 2296098 | +0.07 | 2296306 | +0.08 | 2296466 | +0.09 | | 429.mcf | 8226 | 8226 | +0.00 | 8226 | +0.00 | 8258 | +0.39 | 8258 | +0.39 | | 445.gobmk | 579778 | 579778 | +0.00 | 579826 | +0.01 | 579826 | +0.01 | 580402 | +0.11 | | 456.hmmer | 221058 | 221058 | +0.00 | 221058 | +0.00 | 221058 | +0.00 | 221058 | +0.00 | | 458.sjeng | 93362 | 93362 | +0.00 | 94882 | +1.63 | 94882 | +1.63 | 96066 | +2.90 | | 462.libquantum | 28314 | 28314 | +0.00 | 28362 | +0.17 | 28362 | +0.17 | 28362 | +0.17 | | 464.h264ref | 393874 | 393874 | +0.00 | 393922 | +0.01 | 393922 | +0.01 | 394226 | +0.09 | | 471.omnetpp | 430306 | 430306 | +0.00 | 430418 | +0.03 | 430418 | +0.03 | 430418 | +0.03 | | 473.astar | 29362 | 29538 | +0.60 | 29538 | +0.60 | 29554 | +0.65 | 29554 | +0.65 | | 483.xalancbmk | 2361298 | 2361506 | +0.01 | 2361506 | +0.01 | 2361506 | +0.01 | 2361506 | +0.01 | Zen SPECINT 2006 -Ofast native tuning ====================================== Run-time -------- | Benchmark | trunk | s | % | x1 | % | x2 | % | x4 | % | |----------------+-------+-----+-------+-----+-------+-----+-------+-----+-------| | 400.perlbench | 240 | 239 | -0.42 | 239 | -0.42 | 241 | +0.42 | 238 | -0.83 | | 401.bzip2 | 341 | 341 | +0.00 | 341 | +0.00 | 341 | +0.00 | 340 | -0.29 | | 403.gcc | 210 | 208 | -0.95 | 207 | -1.43 | 209 | -0.48 | 208 | -0.95 | | 429.mcf | 225 | 225 | +0.00 | 225 | +0.00 | 228 | +1.33 | 226 | +0.44 | | 445.gobmk | 352 | 352 | +0.00 | 352 | +0.00 | 351 | -0.28 | 352 | +0.00 | | 456.hmmer | 131 | 131 | +0.00 | 131 | +0.00 | 131 | +0.00 | 131 | +0.00 | | 458.sjeng | 442 | 442 | +0.00 | 438 | -0.90 | 438 | -0.90 | 437 | -1.13 | | 462.libquantum | 291 | 292 | +0.34 | 286 | -1.72 | 287 | -1.37 | 287 | -1.37 | | 464.h264ref | 364 | 365 | +0.27 | 364 | +0.00 | 364 | +0.00 | 363 | -0.27 | | 471.omnetpp | 266 | 266 | +0.00 | 265 | -0.38 | 265 | -0.38 | 265 | -0.38 | | 473.astar | 306 | 307 | +0.33 | 306 | +0.00 | 306 | +0.00 | 309 | +0.98 | | 483.xalancbmk | 177 | 173 | -2.26 | 170 | -3.95 | 170 | -3.95 | 170 | -3.95 | Text size --------- | Benchmark | trunk | strict | % | x1 | % | x2 | % | x4 | % | |----------------+---------+---------+-------+---------+-------+---------+-------+---------+-------| | 400.perlbench | 1161762 | 1161874 | +0.01 | 1161874 | +0.01 | 1162226 | +0.04 | 1162338 | +0.05 | | 401.bzip2 | 80834 | 80834 | +0.00 | 80834 | +0.00 | 80834 | +0.00 | 80834 | +0.00 | | 403.gcc | 3170946 | 3171394 | +0.01 | 3172914 | +0.06 | 3173170 | +0.07 | 3174818 | +0.12 | | 429.mcf | 10418 | 10418 | +0.00 | 10418 | +0.00 | 10450 | +0.31 | 10450 | +0.31 | | 445.gobmk | 779778 | 779778 | +0.00 | 779842 | +0.01 | 779842 | +0.01 | 780418 | +0.08 | | 456.hmmer | 328258 | 328258 | +0.00 | 328258 | +0.00 | 328258 | +0.00 | 328258 | +0.00 | | 458.sjeng | 146386 | 146386 | +0.00 | 148162 | +1.21 | 148162 | +1.21 | 149330 | +2.01 | | 462.libquantum | 30666 | 30666 | +0.00 | 30730 | +0.21 | 30730 | +0.21 | 30730 | +0.21 | | 464.h264ref | 737826 | 737826 | +0.00 | 737890 | +0.01 | 737890 | +0.01 | 739186 | +0.18 | | 471.omnetpp | 561570 | 561570 | +0.00 | 561826 | +0.05 | 561826 | +0.05 | 561826 | +0.05 | | 473.astar | 39314 | 39522 | +0.53 | 39522 | +0.53 | 39538 | +0.57 | 39538 | +0.57 | | 483.xalancbmk | 3319682 | 3319842 | +0.00 | 3319842 | +0.00 | 3319842 | +0.00 | 3319842 | +0.00 | Zen SPECFP 2006 -O2 generic tuning ================================== Run-time -------- | Benchmark | trunk | s | % | x1 | % | x2 | % | x4 | % | |---------------+-------+-----+-------+-----+-------+-----+-------+-----+-------| | 410.bwaves | 214 | 213 | -0.47 | 214 | +0.00 | 214 | +0.00 | 214 | +0.00 | | 433.milc | 290 | 291 | +0.34 | 290 | +0.00 | 295 | +1.72 | 289 | -0.34 | | 434.zeusmp | 182 | 182 | +0.00 | 182 | +0.00 | 184 | +1.10 | 182 | +0.00 | | 435.gromacs | 218 | 218 | +0.00 | 217 | -0.46 | 216 | -0.92 | 220 | +0.92 | | 436.cactusADM | 350 | 349 | -0.29 | 349 | -0.29 | 343 | -2.00 | 349 | -0.29 | | 437.leslie3d | 196 | 195 | -0.51 | 196 | +0.00 | 194 | -1.02 | 196 | +0.00 | | 444.namd | 273 | 273 | +0.00 | 273 | +0.00 | 273 | +0.00 | 273 | +0.00 | | 447.dealII | 211 | 211 | +0.00 | 210 | -0.47 | 210 | -0.47 | 211 | +0.00 | | 450.soplex | 187 | 188 | +0.53 | 188 | +0.53 | 187 | +0.00 | 187 | +0.00 | | 453.povray | 119 | 118 | -0.84 | 119 | +0.00 | 119 | +0.00 | 118 | -0.84 | | 454.calculix | 534 | 533 | -0.19 | 531 | -0.56 | 531 | -0.56 | 532 | -0.37 | | 459.GemsFDTD | 236 | 235 | -0.42 | 235 | -0.42 | 242 | +2.54 | 237 | +0.42 | | 465.tonto | 366 | 365 | -0.27 | 365 | -0.27 | 364 | -0.55 | 365 | -0.27 | | 470.lbm | 181 | 180 | -0.55 | 180 | -0.55 | 180 | -0.55 | 180 | -0.55 | | 481.wrf | 303 | 303 | +0.00 | 302 | -0.33 | 304 | +0.33 | 304 | +0.33 | | 482.sphinx3 | 362 | 362 | +0.00 | 360 | -0.55 | 361 | -0.28 | 363 | +0.28 | Text size --------- | Benchmark | trunk | strict | % | x1 | % | x2 | % | x4 | % | |---------------+---------+---------+-------+---------+-------+---------+-------+---------+-------| | 410.bwaves | 25954 | 25954 | +0.00 | 25954 | +0.00 | 25954 | +0.00 | 25954 | +0.00 | | 433.milc | 87922 | 87922 | +0.00 | 87922 | +0.00 | 88610 | +0.78 | 89042 | +1.27 | | 434.zeusmp | 212034 | 212034 | +0.00 | 212034 | +0.00 | 212034 | +0.00 | 212034 | +0.00 | | 435.gromacs | 747026 | 747026 | +0.00 | 747026 | +0.00 | 747026 | +0.00 | 747026 | +0.00 | | 436.cactusADM | 526178 | 526178 | +0.00 | 526178 | +0.00 | 526274 | +0.02 | 526274 | +0.02 | | 437.leslie3d | 83234 | 83234 | +0.00 | 83234 | +0.00 | 83234 | +0.00 | 83234 | +0.00 | | 444.namd | 297234 | 297266 | +0.01 | 297266 | +0.01 | 297266 | +0.01 | 297266 | +0.01 | | 447.dealII | 2165282 | 2167650 | +0.11 | 2172290 | +0.32 | 2174034 | +0.40 | 2174082 | +0.41 | | 450.soplex | 347122 | 347122 | +0.00 | 347122 | +0.00 | 347122 | +0.00 | 347122 | +0.00 | | 453.povray | 800914 | 800962 | +0.01 | 801570 | +0.08 | 802002 | +0.14 | 803138 | +0.28 | | 454.calculix | 1342802 | 1342802 | +0.00 | 1342802 | +0.00 | 1342802 | +0.00 | 1342802 | +0.00 | | 459.GemsFDTD | 353410 | 354050 | +0.18 | 354050 | +0.18 | 354050 | +0.18 | 354098 | +0.19 | | 465.tonto | 3464210 | 3465058 | +0.02 | 3465058 | +0.02 | 3468434 | +0.12 | 3476594 | +0.36 | | 470.lbm | 9202 | 9202 | +0.00 | 9202 | +0.00 | 9202 | +0.00 | 9202 | +0.00 | | 481.wrf | 3345170 | 3345170 | +0.00 | 3345170 | +0.00 | 3351586 | +0.19 | 3351586 | +0.19 | | 482.sphinx3 | 125026 | 125026 | +0.00 | 125026 | +0.00 | 125026 | +0.00 | 125026 | +0.00 | Zen SPECFP 2006 -Ofast native tuning ==================================== Run-time -------- | Benchmark | trunk | s | % | x1 | % | x2 | % | x4 | % | |---------------+-------+-----+-------+-----+-------+-----+-------+-----+-------| | 410.bwaves | 151 | 150 | -0.66 | 151 | +0.00 | 151 | +0.00 | 151 | +0.00 | | 433.milc | 197 | 197 | +0.00 | 197 | +0.00 | 194 | -1.52 | 186 | -5.58 | | 434.zeusmp | 128 | 128 | +0.00 | 128 | +0.00 | 128 | +0.00 | 128 | +0.00 | | 435.gromacs | 181 | 181 | +0.00 | 180 | -0.55 | 180 | -0.55 | 181 | +0.00 | | 436.cactusADM | 139 | 139 | +0.00 | 139 | +0.00 | 132 | -5.04 | 139 | +0.00 | | 437.leslie3d | 159 | 160 | +0.63 | 160 | +0.63 | 159 | +0.00 | 159 | +0.00 | | 444.namd | 256 | 256 | +0.00 | 255 | -0.39 | 255 | -0.39 | 256 | +0.00 | | 447.dealII | 200 | 200 | +0.00 | 199 | -0.50 | 201 | +0.50 | 201 | +0.50 | | 450.soplex | 184 | 184 | +0.00 | 185 | +0.54 | 184 | +0.00 | 184 | +0.00 | | 453.povray | 124 | 122 | -1.61 | 123 | -0.81 | 124 | +0.00 | 122 | -1.61 | | 454.calculix | 192 | 192 | +0.00 | 192 | +0.00 | 193 | +0.52 | 193 | +0.52 | | 459.GemsFDTD | 208 | 208 | +0.00 | 208 | +0.00 | 214 | +2.88 | 208 | +0.00 | | 465.tonto | 320 | 320 | +0.00 | 320 | +0.00 | 320 | +0.00 | 320 | +0.00 | | 470.lbm | 142 | 142 | +0.00 | 142 | +0.00 | 142 | +0.00 | 142 | +0.00 | | 481.wrf | 195 | 195 | +0.00 | 195 | +0.00 | 195 | +0.00 | 195 | +0.00 | | 482.sphinx3 | 256 | 258 | +0.78 | 256 | +0.00 | 256 | +0.00 | 257 | +0.39 | Text size --------- | Benchmark | trunk | strict | % | x1 | % | x2 | % | x4 | % | |---------------+---------+---------+-------+---------+-------+---------+-------+---------+-------| | 410.bwaves | 27490 | 27490 | +0.00 | 27490 | +0.00 | 27490 | +0.00 | 27490 | +0.00 | | 433.milc | 118178 | 118178 | +0.00 | 118178 | +0.00 | 118962 | +0.66 | 119634 | +1.23 | | 434.zeusmp | 411106 | 411106 | +0.00 | 411106 | +0.00 | 411106 | +0.00 | 411106 | +0.00 | | 435.gromacs | 935970 | 935970 | +0.00 | 935970 | +0.00 | 935970 | +0.00 | 936162 | +0.02 | | 436.cactusADM | 750546 | 750546 | +0.00 | 750546 | +0.00 | 750626 | +0.01 | 750626 | +0.01 | | 437.leslie3d | 123410 | 123410 | +0.00 | 123410 | +0.00 | 123410 | +0.00 | 123410 | +0.00 | | 444.namd | 284082 | 284114 | +0.01 | 284114 | +0.01 | 284114 | +0.01 | 284114 | +0.01 | | 447.dealII | 2438610 | 2440946 | +0.10 | 2444978 | +0.26 | 2446882 | +0.34 | 2446930 | +0.34 | | 450.soplex | 443218 | 443218 | +0.00 | 443218 | +0.00 | 443218 | +0.00 | 443218 | +0.00 | | 453.povray | 1077778 | 1077890 | +0.01 | 1078658 | +0.08 | 1079026 | +0.12 | 1080370 | +0.24 | | 454.calculix | 1639138 | 1639138 | +0.00 | 1639138 | +0.00 | 1639474 | +0.02 | 1639474 | +0.02 | | 459.GemsFDTD | 451202 | 451234 | +0.01 | 451234 | +0.01 | 451234 | +0.01 | 451282 | +0.02 | | 465.tonto | 4584690 | 4585250 | +0.01 | 4585250 | +0.01 | 4588130 | +0.08 | 4595442 | +0.23 | | 470.lbm | 9858 | 9858 | +0.00 | 9858 | +0.00 | 9858 | +0.00 | 9858 | +0.00 | | 481.wrf | 4588002 | 4588002 | +0.00 | 4588290 | +0.01 | 4621010 | +0.72 | 4621922 | +0.74 | | 482.sphinx3 | 179602 | 179602 | +0.00 | 179602 | +0.00 | 179602 | +0.00 | 179602 | +0.00 | Zen SPEC INT 2017 -O2 generic tuning ==================================== Run-time -------- | Benchmark | trunk | s | % | x1 | % | x2 | % | x4 | % | |-----------------+-------+-----+-------+-----+-------+-----+-------+-----+-------| | 500.perlbench_r | 529 | 529 | +0.00 | 531 | +0.38 | 530 | +0.19 | 534 | +0.95 | | 502.gcc_r | 338 | 333 | -1.48 | 334 | -1.18 | 339 | +0.30 | 339 | +0.30 | | 505.mcf_r | 382 | 381 | -0.26 | 382 | +0.00 | 382 | +0.00 | 381 | -0.26 | | 520.omnetpp_r | 511 | 503 | -1.57 | 497 | -2.74 | 497 | -2.74 | 497 | -2.74 | | 523.xalancbmk_r | 391 | 388 | -0.77 | 389 | -0.51 | 390 | -0.26 | 391 | +0.00 | | 525.x264_r | 590 | 590 | +0.00 | 591 | +0.17 | 592 | +0.34 | 593 | +0.51 | | 531.deepsjeng_r | 427 | 427 | +0.00 | 427 | +0.00 | 428 | +0.23 | 427 | +0.00 | | 541.leela_r | 716 | 716 | +0.00 | 716 | +0.00 | 719 | +0.42 | 719 | +0.42 | | 548.exchange2_r | 593 | 593 | +0.00 | 593 | +0.00 | 593 | +0.00 | 593 | +0.00 | | 557.xz_r | 452 | 452 | +0.00 | 453 | +0.22 | 454 | +0.44 | 452 | +0.00 | Text size --------- | Benchmark | trunk | strict | % | x1 | % | x2 | % | x4 | % | |-----------------+---------+---------+-------+---------+-------+---------+-------+---------+-------| | 500.perlbench_r | 1599442 | 1599522 | +0.01 | 1599522 | +0.01 | 1599522 | +0.01 | 1600082 | +0.04 | | 502.gcc_r | 6757602 | 6758978 | +0.02 | 6759090 | +0.02 | 6759842 | +0.03 | 6760306 | +0.04 | | 505.mcf_r | 16098 | 16098 | +0.00 | 16098 | +0.00 | 16098 | +0.00 | 16306 | +1.29 | | 520.omnetpp_r | 1262498 | 1262562 | +0.01 | 1264034 | +0.12 | 1264034 | +0.12 | 1264034 | +0.12 | | 523.xalancbmk_r | 3989026 | 3989202 | +0.00 | 3989202 | +0.00 | 3989202 | +0.00 | 3989202 | +0.00 | | 525.x264_r | 414130 | 414194 | +0.02 | 414194 | +0.02 | 414738 | +0.15 | 415122 | +0.24 | | 531.deepsjeng_r | 67426 | 67426 | +0.00 | 67458 | +0.05 | 67458 | +0.05 | 67458 | +0.05 | | 541.leela_r | 219378 | 219378 | +0.00 | 219378 | +0.00 | 224082 | +2.14 | 237026 | +8.04 | | 548.exchange2_r | 61234 | 61234 | +0.00 | 61234 | +0.00 | 61234 | +0.00 | 61234 | +0.00 | | 557.xz_r | 111490 | 111490 | +0.00 | 111490 | +0.00 | 111506 | +0.01 | 111890 | +0.36 | Zen SPEC INT 2017 -Ofast native tuning ====================================== Run-time --------- | Benchmark | trunk | s | % | x1 | % | x2 | % | x4 | % | |-----------------+-------+-----+-------+-----+-------+-----+-------+-----+-------| | 500.perlbench_r | 525 | 524 | -0.19 | 525 | +0.00 | 525 | +0.00 | 534 | +1.71 | | 502.gcc_r | 331 | 329 | -0.60 | 324 | -2.11 | 330 | -0.30 | 324 | -2.11 | | 505.mcf_r | 380 | 380 | +0.00 | 381 | +0.26 | 380 | +0.00 | 379 | -0.26 | | 520.omnetpp_r | 487 | 486 | -0.21 | 488 | +0.21 | 489 | +0.41 | 488 | +0.21 | | 523.xalancbmk_r | 373 | 369 | -1.07 | 367 | -1.61 | 370 | -0.80 | 368 | -1.34 | | 525.x264_r | 319 | 319 | +0.00 | 320 | +0.31 | 321 | +0.63 | 322 | +0.94 | | 531.deepsjeng_r | 418 | 418 | +0.00 | 418 | +0.00 | 418 | +0.00 | 419 | +0.24 | | 541.leela_r | 674 | 674 | +0.00 | 674 | +0.00 | 672 | -0.30 | 672 | -0.30 | | 548.exchange2_r | 466 | 466 | +0.00 | 466 | +0.00 | 466 | +0.00 | 466 | +0.00 | | 557.xz_r | 443 | 443 | +0.00 | 443 | +0.00 | 449 | +1.35 | 449 | +1.35 | Text size --------- | Benchmark | trunk | strict | % | x1 | % | x2 | % | x4 | % | |-----------------+---------+---------+-------+---------+-------+---------+-------+---------+-------| | 500.perlbench_r | 2122882 | 2122962 | +0.00 | 2122962 | +0.00 | 2122962 | +0.00 | 2122514 | -0.02 | | 502.gcc_r | 8566290 | 8567794 | +0.02 | 8569138 | +0.03 | 8570066 | +0.04 | 8570642 | +0.05 | | 505.mcf_r | 26770 | 26770 | +0.00 | 26770 | +0.00 | 26770 | +0.00 | 26962 | +0.72 | | 520.omnetpp_r | 1713938 | 1713954 | +0.00 | 1714754 | +0.05 | 1714754 | +0.05 | 1714754 | +0.05 | | 523.xalancbmk_r | 4881890 | 4882114 | +0.00 | 4882114 | +0.00 | 4882114 | +0.00 | 4882114 | +0.00 | | 525.x264_r | 601522 | 601602 | +0.01 | 601602 | +0.01 | 602130 | +0.10 | 602834 | +0.22 | | 531.deepsjeng_r | 90306 | 90306 | +0.00 | 90338 | +0.04 | 90338 | +0.04 | 90338 | +0.04 | | 541.leela_r | 277634 | 277650 | +0.01 | 277650 | +0.01 | 282386 | +1.71 | 295778 | +6.54 | | 548.exchange2_r | 109058 | 109058 | +0.00 | 109058 | +0.00 | 109058 | +0.00 | 109058 | +0.00 | | 557.xz_r | 154594 | 154594 | +0.00 | 154594 | +0.00 | 154610 | +0.01 | 154930 | +0.22 | Zen SPEC 2017 FP -O2 generic tuning =================================== Run-time -------- | Benchmark | trunk | s | % | x1 | % | x2 | % | x4 | % | |-----------------+-------+-----+--------+-----+--------+-----+--------+-----+--------| | 503.bwaves_r | 801 | 801 | +0.00 | 801 | +0.00 | 801 | +0.00 | 801 | +0.00 | | 507.cactuBSSN_r | 303 | 302 | -0.33 | 299 | -1.32 | 302 | -0.33 | 307 | +1.32 | | 508.namd_r | 306 | 306 | +0.00 | 307 | +0.33 | 306 | +0.00 | 306 | +0.00 | | 510.parest_r | 558 | 553 | -0.90 | 561 | +0.54 | 554 | -0.72 | 562 | +0.72 | | 511.povray_r | 679 | 672 | -1.03 | 673 | -0.88 | 680 | +0.15 | 644 | -5.15 | | 519.lbm_r | 240 | 240 | +0.00 | 240 | +0.00 | 240 | +0.00 | 240 | +0.00 | | 521.wrf_r | 851 | 827 | -2.82 | 827 | -2.82 | 827 | -2.82 | 828 | -2.70 | | 526.blender_r | 376 | 376 | +0.00 | 379 | +0.80 | 377 | +0.27 | 376 | +0.00 | | 527.cam4_r | 529 | 527 | -0.38 | 533 | +0.76 | 536 | +1.32 | 528 | -0.19 | | 538.imagick_r | 646 | 570 | -11.76 | 570 | -11.76 | 569 | -11.92 | 570 | -11.76 | | 544.nab_r | 467 | 467 | +0.00 | 467 | +0.00 | 467 | +0.00 | 467 | +0.00 | | 549.fotonik3d_r | 413 | 413 | +0.00 | 414 | +0.24 | 415 | +0.48 | 413 | +0.00 | | 554.roms_r | 459 | 455 | -0.87 | 456 | -0.65 | 456 | -0.65 | 456 | -0.65 | Text size --------- | Benchmark | trunk | strict | % | x1 | % | x2 | % | x4 | % | |-----------------+----------+----------+-------+----------+-------+----------+-------+----------+-------| | 503.bwaves_r | 32034 | 32034 | +0.00 | 32034 | +0.00 | 32034 | +0.00 | 32034 | +0.00 | | 507.cactuBSSN_r | 2951634 | 2951634 | +0.00 | 2951634 | +0.00 | 2951698 | +0.00 | 2951730 | +0.00 | | 508.namd_r | 837458 | 837490 | +0.00 | 837490 | +0.00 | 837490 | +0.00 | 837490 | +0.00 | | 510.parest_r | 6540866 | 6545618 | +0.07 | 6546754 | +0.09 | 6561426 | +0.31 | 6569426 | +0.44 | | 511.povray_r | 803618 | 803666 | +0.01 | 804274 | +0.08 | 804706 | +0.14 | 805842 | +0.28 | | 519.lbm_r | 12018 | 12018 | +0.00 | 12018 | +0.00 | 12018 | +0.00 | 12018 | +0.00 | | 521.wrf_r | 16292962 | 16296786 | +0.02 | 16296978 | +0.02 | 16302594 | +0.06 | 16419842 | +0.78 | | 526.blender_r | 7268224 | 7281264 | +0.18 | 7282608 | +0.20 | 7289168 | +0.29 | 7295296 | +0.37 | | 527.cam4_r | 5063666 | 5063922 | +0.01 | 5065010 | +0.03 | 5068114 | +0.09 | 5072946 | +0.18 | | 538.imagick_r | 1608178 | 1609282 | +0.07 | 1609282 | +0.07 | 1613458 | +0.33 | 1613970 | +0.36 | | 544.nab_r | 156242 | 156242 | +0.00 | 156242 | +0.00 | 156242 | +0.00 | 156242 | +0.00 | | 549.fotonik3d_r | 326738 | 326738 | +0.00 | 326738 | +0.00 | 326738 | +0.00 | 326738 | +0.00 | | 554.roms_r | 728546 | 728546 | +0.00 | 728546 | +0.00 | 728546 | +0.00 | 728546 | +0.00 | Zen SPEC 2017 FP -Ofast native tuning ===================================== Run-time -------- | Benchmark | trunk | s | % | x1 | % | x2 | % | x4 | % | |-----------------+-------+-----+-------+-----+-------+-----+-------+-----+-------| | 503.bwaves_r | 310 | 310 | +0.00 | 310 | +0.00 | 310 | +0.00 | 309 | -0.32 | | 507.cactuBSSN_r | 269 | 266 | -1.12 | 266 | -1.12 | 268 | -0.37 | 270 | +0.37 | | 508.namd_r | 270 | 269 | -0.37 | 269 | -0.37 | 268 | -0.74 | 268 | -0.74 | | 510.parest_r | 607 | 601 | -0.99 | 599 | -1.32 | 599 | -1.32 | 604 | -0.49 | | 511.povray_r | 662 | 664 | +0.30 | 671 | +1.36 | 680 | +2.72 | 675 | +1.96 | | 519.lbm_r | 186 | 186 | +0.00 | 186 | +0.00 | 186 | +0.00 | 186 | +0.00 | | 521.wrf_r | 550 | 554 | +0.73 | 550 | +0.00 | 550 | +0.00 | 549 | -0.18 | | 526.blender_r | 355 | 354 | -0.28 | 355 | +0.00 | 354 | -0.28 | 354 | -0.28 | | 527.cam4_r | 434 | 437 | +0.69 | 435 | +0.23 | 437 | +0.69 | 435 | +0.23 | | 538.imagick_r | 433 | 420 | -3.00 | 420 | -3.00 | 420 | -3.00 | 419 | -3.23 | | 544.nab_r | 424 | 425 | +0.24 | 425 | +0.24 | 425 | +0.24 | 425 | +0.24 | | 549.fotonik3d_r | 421 | 422 | +0.24 | 422 | +0.24 | 422 | +0.24 | 422 | +0.24 | | 554.roms_r | 360 | 361 | +0.28 | 361 | +0.28 | 361 | +0.28 | 361 | +0.28 | +1.36% for 511.povray_r is the worst regression for the proposed x1 defaults, by the way. I have not investigated it further, however. Text size --------- | Benchmark | trunk | strict | % | x1 | % | x2 | % | x4 | % | |-----------------+----------+----------+-------+----------+-------+----------+-------+----------+-------| | 503.bwaves_r | 34562 | 34562 | +0.00 | 34562 | +0.00 | 34562 | +0.00 | 34562 | +0.00 | | 507.cactuBSSN_r | 3978402 | 3978402 | +0.00 | 3978402 | +0.00 | 3978514 | +0.00 | 3978546 | +0.00 | | 508.namd_r | 869106 | 869154 | +0.01 | 869154 | +0.01 | 869154 | +0.01 | 869154 | +0.01 | | 510.parest_r | 7186258 | 7189298 | +0.04 | 7190370 | +0.06 | 7203890 | +0.25 | 7211202 | +0.35 | | 511.povray_r | 1063314 | 1063410 | +0.01 | 1064178 | +0.08 | 1064546 | +0.12 | 1065890 | +0.24 | | 519.lbm_r | 12178 | 12178 | +0.00 | 12178 | +0.00 | 12178 | +0.00 | 12178 | +0.00 | | 521.wrf_r | 19480946 | 19484146 | +0.02 | 19484466 | +0.02 | 19607538 | +0.65 | 19716178 | +1.21 | | 526.blender_r | 9708752 | 9719952 | +0.12 | 9722768 | +0.14 | 9730224 | +0.22 | 9737760 | +0.30 | | 527.cam4_r | 6217970 | 6218162 | +0.00 | 6219570 | +0.03 | 6223362 | +0.09 | 6227762 | +0.16 | | 538.imagick_r | 2255682 | 2256162 | +0.02 | 2256162 | +0.02 | 2261346 | +0.25 | 2261938 | +0.28 | | 544.nab_r | 212418 | 212418 | +0.00 | 212418 | +0.00 | 212418 | +0.00 | 212578 | +0.08 | | 549.fotonik3d_r | 454738 | 454738 | +0.00 | 454738 | +0.00 | 454738 | +0.00 | 454738 | +0.00 | | 554.roms_r | 910978 | 910978 | +0.00 | 910978 | +0.00 | 910978 | +0.00 | 910978 | +0.00 | I believe the numbers are good and thus I would like to ask-for re-consideration of the objection and for approval to commit the patch below. Needless to say, it has passed bootstrap and testing on x86_64-linux. Thanks Martin 2017-10-27 Martin Jambor <mjam...@suse.cz> PR target/80689 * tree-sra.h: New file. * ipa-prop.h: Moved declaration of build_ref_for_offset to tree-sra.h. * expr.c: Include params.h and tree-sra.h. (emit_move_elementwise): New function. (store_expr_with_bounds): Optionally use it. * ipa-cp.c: Include tree-sra.h. * params.def (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY): New. (PARAM_MAX_INSNS_FOR_ELEMENTWISE_COPY): Likewise. * config/i386/i386.c (ix86_option_override_internal): Set PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY to 35. * tree-sra.c: Include tree-sra.h. (scalarizable_type_p): Renamed to simple_mix_of_records_and_arrays_p, made public, renamed the second parameter to allow_char_arrays, added count_p parameter. (extract_min_max_idx_from_array): New function. (completely_scalarize): Moved bits of the function to extract_min_max_idx_from_array. testsuite/ * gcc.target/i386/pr80689-1.c: New test. Added insns count param limit --- gcc/config/i386/i386.c | 4 + gcc/expr.c | 106 ++++++++++++++++++++++- gcc/ipa-cp.c | 1 + gcc/ipa-prop.h | 4 - gcc/params.def | 12 +++ gcc/testsuite/gcc.target/i386/pr80689-1.c | 38 +++++++++ gcc/tree-sra.c | 134 +++++++++++++++++++++--------- gcc/tree-sra.h | 34 ++++++++ 8 files changed, 288 insertions(+), 45 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr80689-1.c create mode 100644 gcc/tree-sra.h diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 80c8ce7ecb9..0bff2da72dd 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -4580,6 +4580,10 @@ ix86_option_override_internal (bool main_args_p, ix86_tune_cost->l2_cache_size, opts->x_param_values, opts_set->x_param_values); + maybe_set_param_value (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY, + 35, + opts->x_param_values, + opts_set->x_param_values); /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful. */ if (opts->x_flag_prefetch_loop_arrays < 0 diff --git a/gcc/expr.c b/gcc/expr.c index 496d492c9fa..971880b635d 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -61,7 +61,8 @@ along with GCC; see the file COPYING3. If not see #include "tree-chkp.h" #include "rtl-chkp.h" #include "ccmp.h" - +#include "params.h" +#include "tree-sra.h" /* If this is nonzero, we do not bother generating VOLATILE around volatile memory references, and we are willing to @@ -5340,6 +5341,80 @@ emit_storent_insn (rtx to, rtx from) return maybe_expand_insn (code, 2, ops); } +/* Generate code for copying data of type TYPE at SOURCE plus OFFSET to TARGET + plus OFFSET, but do so element-wise and/or field-wise for each record and + array within TYPE. TYPE must either be a register type or an aggregate + complying with scalarizable_type_p. + + If CALL_PARAM_P is nonzero, this is a store into a call param on the + stack, and block moves may need to be treated specially. */ + +static void +emit_move_elementwise (tree type, rtx target, rtx source, HOST_WIDE_INT offset, + int call_param_p) +{ + switch (TREE_CODE (type)) + { + case RECORD_TYPE: + for (tree fld = TYPE_FIELDS (type); fld; fld = DECL_CHAIN (fld)) + if (TREE_CODE (fld) == FIELD_DECL) + { + HOST_WIDE_INT fld_offset = offset + int_bit_position (fld); + tree ft = TREE_TYPE (fld); + emit_move_elementwise (ft, target, source, fld_offset, + call_param_p); + } + break; + + case ARRAY_TYPE: + { + tree elem_type = TREE_TYPE (type); + HOST_WIDE_INT el_size = tree_to_shwi (TYPE_SIZE (elem_type)); + gcc_assert (el_size > 0); + + offset_int idx, max; + /* Skip (some) zero-length arrays; others have MAXIDX == MINIDX - 1. */ + if (extract_min_max_idx_from_array (type, &idx, &max)) + { + HOST_WIDE_INT el_offset = offset; + for (; idx <= max; ++idx) + { + emit_move_elementwise (elem_type, target, source, el_offset, + call_param_p); + el_offset += el_size; + } + } + } + break; + default: + machine_mode mode = TYPE_MODE (type); + + rtx ntgt = adjust_address (target, mode, offset / BITS_PER_UNIT); + rtx nsrc = adjust_address (source, mode, offset / BITS_PER_UNIT); + + /* TODO: Figure out whether the following is actually necessary. */ + if (target == ntgt) + ntgt = copy_rtx (target); + if (source == nsrc) + nsrc = copy_rtx (source); + + gcc_assert (mode != VOIDmode); + if (mode != BLKmode) + emit_move_insn (ntgt, nsrc); + else + { + /* For example vector gimple registers can end up here. */ + rtx size = expand_expr (TYPE_SIZE_UNIT (type), NULL_RTX, + TYPE_MODE (sizetype), EXPAND_NORMAL); + emit_block_move (ntgt, nsrc, size, + (call_param_p + ? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL)); + } + break; + } + return; +} + /* Generate code for computing expression EXP, and storing the value into TARGET. @@ -5713,9 +5788,32 @@ store_expr_with_bounds (tree exp, rtx target, int call_param_p, emit_group_store (target, temp, TREE_TYPE (exp), int_size_in_bytes (TREE_TYPE (exp))); else if (GET_MODE (temp) == BLKmode) - emit_block_move (target, temp, expr_size (exp), - (call_param_p - ? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL)); + { + /* Copying smallish BLKmode structures with emit_block_move and thus + by-pieces can result in store-to-load stalls. So copy some simple + small aggregates element or field-wise. */ + int count = 0; + if (GET_MODE (target) == BLKmode + && AGGREGATE_TYPE_P (TREE_TYPE (exp)) + && !TREE_ADDRESSABLE (TREE_TYPE (exp)) + && tree_fits_shwi_p (TYPE_SIZE (TREE_TYPE (exp))) + && (tree_to_shwi (TYPE_SIZE (TREE_TYPE (exp))) + <= (PARAM_VALUE (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY) + * BITS_PER_UNIT)) + && simple_mix_of_records_and_arrays_p (TREE_TYPE (exp), false, + &count) + && (count <= PARAM_VALUE (PARAM_MAX_INSNS_FOR_ELEMENTWISE_COPY))) + { + /* FIXME: Can this happen? What would it mean? */ + gcc_assert (!reverse); + emit_move_elementwise (TREE_TYPE (exp), target, temp, 0, + call_param_p); + } + else + emit_block_move (target, temp, expr_size (exp), + (call_param_p + ? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL)); + } /* If we emit a nontemporal store, there is nothing else to do. */ else if (nontemporal && emit_storent_insn (target, temp)) ; diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c index d23c1d8ba3e..30f91e70c22 100644 --- a/gcc/ipa-cp.c +++ b/gcc/ipa-cp.c @@ -124,6 +124,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-ssa-ccp.h" #include "stringpool.h" #include "attribs.h" +#include "tree-sra.h" template <typename valtype> class ipcp_value; diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h index fa5bed49ee0..2313cc884ed 100644 --- a/gcc/ipa-prop.h +++ b/gcc/ipa-prop.h @@ -877,10 +877,6 @@ ipa_parm_adjustment *ipa_get_adjustment_candidate (tree **, bool *, void ipa_release_body_info (struct ipa_func_body_info *); tree ipa_get_callee_param_type (struct cgraph_edge *e, int i); -/* From tree-sra.c: */ -tree build_ref_for_offset (location_t, tree, HOST_WIDE_INT, bool, tree, - gimple_stmt_iterator *, bool); - /* In ipa-cp.c */ void ipa_cp_c_finalize (void); diff --git a/gcc/params.def b/gcc/params.def index 8881f4c403a..9c778f9540a 100644 --- a/gcc/params.def +++ b/gcc/params.def @@ -1287,6 +1287,18 @@ DEFPARAM (PARAM_VECT_EPILOGUES_NOMASK, "Enable loop epilogue vectorization using smaller vector size.", 0, 0, 1) +DEFPARAM (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY, + "max-size-for-elementwise-copy", + "Maximum size in bytes of a structure or an array to by considered " + "for copying by its individual fields or elements", + 0, 0, 512) + +DEFPARAM (PARAM_MAX_INSNS_FOR_ELEMENTWISE_COPY, + "max-insns-for-elementwise-copy", + "Maximum number of instructions needed to consider copying " + "a structure or an array by its individual fields or elements", + 6, 0, 64) + /* Local variables: diff --git a/gcc/testsuite/gcc.target/i386/pr80689-1.c b/gcc/testsuite/gcc.target/i386/pr80689-1.c new file mode 100644 index 00000000000..4156d4fba45 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr80689-1.c @@ -0,0 +1,38 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +typedef struct st1 +{ + long unsigned int a,b; + long int c,d; +}R; + +typedef struct st2 +{ + int t; + R reg; +}N; + +void Set (const R *region, N *n_info ); + +void test(N *n_obj ,const long unsigned int a, const long unsigned int b, const long int c,const long int d) +{ + R reg; + + reg.a=a; + reg.b=b; + reg.c=c; + reg.d=d; + Set (®, n_obj); + +} + +void Set (const R *reg, N *n_obj ) +{ + n_obj->reg=(*reg); +} + + +/* { dg-final { scan-assembler-not "%(x|y|z)mm\[0-9\]+" } } */ +/* { dg-final { scan-assembler-not "movdqu" } } */ +/* { dg-final { scan-assembler-not "movups" } } */ diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c index bac593951e7..d06463ce21c 100644 --- a/gcc/tree-sra.c +++ b/gcc/tree-sra.c @@ -104,6 +104,7 @@ along with GCC; see the file COPYING3. If not see #include "ipa-fnsummary.h" #include "ipa-utils.h" #include "builtins.h" +#include "tree-sra.h" /* Enumeration of all aggregate reductions we can do. */ enum sra_mode { SRA_MODE_EARLY_IPA, /* early call regularization */ @@ -952,14 +953,15 @@ create_access (tree expr, gimple *stmt, bool write) } -/* Return true iff TYPE is scalarizable - i.e. a RECORD_TYPE or fixed-length - ARRAY_TYPE with fields that are either of gimple register types (excluding - bit-fields) or (recursively) scalarizable types. CONST_DECL must be true if - we are considering a decl from constant pool. If it is false, char arrays - will be refused. */ +/* Return true if TYPE consists of RECORD_TYPE or fixed-length ARRAY_TYPE with + fields/elements that are not bit-fields and are either register types or + recursively comply with simple_mix_of_records_and_arrays_p. Furthermore, if + ALLOW_CHAR_ARRAYS is false, the function will return false also if TYPE + contains an array of elements that only have one byte. */ -static bool -scalarizable_type_p (tree type, bool const_decl) +bool +simple_mix_of_records_and_arrays_p (tree type, bool allow_char_arrays, + int *count_p) { gcc_assert (!is_gimple_reg_type (type)); if (type_contains_placeholder_p (type)) @@ -976,8 +978,13 @@ scalarizable_type_p (tree type, bool const_decl) if (DECL_BIT_FIELD (fld)) return false; - if (!is_gimple_reg_type (ft) - && !scalarizable_type_p (ft, const_decl)) + if (is_gimple_reg_type (ft)) + { + if (count_p) + (*count_p)++; + } + else if (!simple_mix_of_records_and_arrays_p (ft, allow_char_arrays, + count_p)) return false; } @@ -986,7 +993,7 @@ scalarizable_type_p (tree type, bool const_decl) case ARRAY_TYPE: { HOST_WIDE_INT min_elem_size; - if (const_decl) + if (allow_char_arrays) min_elem_size = 0; else min_elem_size = BITS_PER_UNIT; @@ -1007,9 +1014,45 @@ scalarizable_type_p (tree type, bool const_decl) return false; tree elem = TREE_TYPE (type); - if (!is_gimple_reg_type (elem) - && !scalarizable_type_p (elem, const_decl)) - return false; + if (!count_p) + { + if (!is_gimple_reg_type (elem) + && !simple_mix_of_records_and_arrays_p (elem, allow_char_arrays, + NULL)) + return false; + else + return true; + } + + offset_int min, max; + HOST_WIDE_INT ds; + bool nonzero = extract_min_max_idx_from_array (type, &min, &max); + + if (nonzero && (min <= max)) + { + offset_int d = max - min + 1; + if (!wi::fits_shwi_p (d)) + return false; + ds = d.to_shwi (); + if (ds > INT_MAX) + return false; + } + else + ds = 0; + + if (is_gimple_reg_type (elem)) + *count_p += (int) ds; + else + { + int elc = 0; + if (!simple_mix_of_records_and_arrays_p (elem, allow_char_arrays, + &elc)) + return false; + ds *= elc; + if (ds > INT_MAX) + return false; + *count_p += (unsigned) ds; + } return true; } default: @@ -1017,10 +1060,38 @@ scalarizable_type_p (tree type, bool const_decl) } } -static void scalarize_elem (tree, HOST_WIDE_INT, HOST_WIDE_INT, bool, tree, tree); +static void scalarize_elem (tree, HOST_WIDE_INT, HOST_WIDE_INT, bool, tree, + tree); + +/* For a given array TYPE, return false if its domain does not have any maximum + value. Otherwise calculate MIN and MAX indices of the first and the last + element. */ + +bool +extract_min_max_idx_from_array (tree type, offset_int *min, offset_int *max) +{ + tree domain = TYPE_DOMAIN (type); + tree minidx = TYPE_MIN_VALUE (domain); + gcc_assert (TREE_CODE (minidx) == INTEGER_CST); + tree maxidx = TYPE_MAX_VALUE (domain); + if (!maxidx) + return false; + gcc_assert (TREE_CODE (maxidx) == INTEGER_CST); + + /* MINIDX and MAXIDX are inclusive, and must be interpreted in + DOMAIN (e.g. signed int, whereas min/max may be size_int). */ + *min = wi::to_offset (minidx); + *max = wi::to_offset (maxidx); + if (!TYPE_UNSIGNED (domain)) + { + *min = wi::sext (*min, TYPE_PRECISION (domain)); + *max = wi::sext (*max, TYPE_PRECISION (domain)); + } + return true; +} /* Create total_scalarization accesses for all scalar fields of a member - of type DECL_TYPE conforming to scalarizable_type_p. BASE + of type DECL_TYPE conforming to simple_mix_of_records_and_arrays_p. BASE must be the top-most VAR_DECL representing the variable; within that, OFFSET locates the member and REF must be the memory reference expression for the member. */ @@ -1047,27 +1118,14 @@ completely_scalarize (tree base, tree decl_type, HOST_WIDE_INT offset, tree ref) { tree elemtype = TREE_TYPE (decl_type); tree elem_size = TYPE_SIZE (elemtype); - gcc_assert (elem_size && tree_fits_shwi_p (elem_size)); HOST_WIDE_INT el_size = tree_to_shwi (elem_size); gcc_assert (el_size > 0); - tree minidx = TYPE_MIN_VALUE (TYPE_DOMAIN (decl_type)); - gcc_assert (TREE_CODE (minidx) == INTEGER_CST); - tree maxidx = TYPE_MAX_VALUE (TYPE_DOMAIN (decl_type)); + offset_int idx, max; /* Skip (some) zero-length arrays; others have MAXIDX == MINIDX - 1. */ - if (maxidx) + if (extract_min_max_idx_from_array (decl_type, &idx, &max)) { - gcc_assert (TREE_CODE (maxidx) == INTEGER_CST); tree domain = TYPE_DOMAIN (decl_type); - /* MINIDX and MAXIDX are inclusive, and must be interpreted in - DOMAIN (e.g. signed int, whereas min/max may be size_int). */ - offset_int idx = wi::to_offset (minidx); - offset_int max = wi::to_offset (maxidx); - if (!TYPE_UNSIGNED (domain)) - { - idx = wi::sext (idx, TYPE_PRECISION (domain)); - max = wi::sext (max, TYPE_PRECISION (domain)); - } for (int el_off = offset; idx <= max; ++idx) { tree nref = build4 (ARRAY_REF, elemtype, @@ -1088,10 +1146,10 @@ completely_scalarize (tree base, tree decl_type, HOST_WIDE_INT offset, tree ref) } /* Create total_scalarization accesses for a member of type TYPE, which must - satisfy either is_gimple_reg_type or scalarizable_type_p. BASE must be the - top-most VAR_DECL representing the variable; within that, POS and SIZE locate - the member, REVERSE gives its torage order. and REF must be the reference - expression for it. */ + satisfy either is_gimple_reg_type or simple_mix_of_records_and_arrays_p. + BASE must be the top-most VAR_DECL representing the variable; within that, + POS and SIZE locate the member, REVERSE gives its torage order. and REF must + be the reference expression for it. */ static void scalarize_elem (tree base, HOST_WIDE_INT pos, HOST_WIDE_INT size, bool reverse, @@ -1111,7 +1169,8 @@ scalarize_elem (tree base, HOST_WIDE_INT pos, HOST_WIDE_INT size, bool reverse, } /* Create a total_scalarization access for VAR as a whole. VAR must be of a - RECORD_TYPE or ARRAY_TYPE conforming to scalarizable_type_p. */ + RECORD_TYPE or ARRAY_TYPE conforming to + simple_mix_of_records_and_arrays_p. */ static void create_total_scalarization_access (tree var) @@ -2803,8 +2862,9 @@ analyze_all_variable_accesses (void) { tree var = candidate (i); - if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var), - constant_decl_p (var))) + if (VAR_P (var) + && simple_mix_of_records_and_arrays_p (TREE_TYPE (var), + constant_decl_p (var), NULL)) { if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var))) <= max_scalarization_size) diff --git a/gcc/tree-sra.h b/gcc/tree-sra.h new file mode 100644 index 00000000000..2857688b21e --- /dev/null +++ b/gcc/tree-sra.h @@ -0,0 +1,34 @@ +/* tree-sra.h - Run-time parameters. + Copyright (C) 2017 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +<http://www.gnu.org/licenses/>. */ + +#ifndef TREE_SRA_H +#define TREE_SRA_H + + +bool simple_mix_of_records_and_arrays_p (tree type, bool allow_char_arrays, + int *count_pg); +bool extract_min_max_idx_from_array (tree type, offset_int *idx, + offset_int *max); +tree build_ref_for_offset (location_t loc, tree base, HOST_WIDE_INT offset, + bool reverse, tree exp_type, + gimple_stmt_iterator *gsi, bool insert_after); + + + +#endif /* TREE_SRA_H */ -- 2.14.2