------- Comment #24 from hubicka at gcc dot gnu dot org 2008-02-08 15:11 ------- Hi, the tonight runs with continue heuristics shows again improvements on 64bit scores , but degradation on 32bit scores. Looking into the loop, the real trouble seems to be that the main loop has 6 loop carried variables:
scan_end, scan_end1, best_len, scan, chain_length, cur_match plus few temporaries are needed too. Obviously we can't fit in registers on i386. Making profile more realistic sometime helps sometimes hurts pretty much at random basis. One case where I think register presure is increased is the fact that different SSA names of both scan_end and scan_end1 variables are actually not fully coalesced in out-of-SSA. This is result of optimizing: if (match[best_len] != scan_end || match[best_len-1] != scan_end1 || *match != *scan || *++match != scan[1]) continue; ...later code sometimes modifying scan_end.... into computing match[best_len] into name of scan_end that is sometimes assigned int the later code on the path not modifying scan_end. As a result we do have two scan_ends live at once. I wonder if we can avoid this behaviour, though it looks all right on SSA form, it would save 2 "global" registers: there is no need at all to cache match[best_len]/match[best_len1] in register unless I missed something. Those two vars are manipulated on the hot paths through the loop. Now the RA is driven by frequencies (bit confused by fact that two of loop carried vars are split) and by their "liveranges" that is actually number of instructions in bettween first and last occurence. Since we are bit carelless on BB ordering moving some code to the very end of function, this heuristics is not realistic at all. It would probably make more sense to replace it by number of inssn it is live across, but this is probably ninsn*npseudos to compute. Other idea would be degree in conflict graph, but I am not sure we want to start such experiemtns in parallel with YARA. I tested YARA and it does not handle this situation much better. Perhaps Vladimir can help? Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761