On Wed, Nov 10, 2010 at 2:27 PM, Jonas Maebe <jonas.ma...@elis.ugent.be> wrote: > > On 30 Oct 2010, at 13:20, Matthias K. wrote: > >> the last days I've done a first step in Porting the i386 data flow >> analyzer, asmcse and peephole optimizations. > > Quite impressive! > >> Main motivation is: target instruction level optimization is always a >> good thing especially for bottlenecks. > > That's true. There's one small problem though: the asmcse optimiser > (csopt386, and large parts of daopt386) has been on its way out the last > couple of releases, because it contains some bugs that are very hard to fix > due to it not being very good/clean code. It is already no longer activated > by default for -O2 since FPC 2.4.0, and currently has to be enabled > explicitly via -Ooasmcse. > > The final drop in the bucket that caused it to be disabled by default was > http://bugs.freepascal.org/view.php?id=14363 > >> The main target was: porting the i386 optimization part to x86_64 >> (amd64) and merging it back such that generic x86 optimization is in >> one place. > > If you are willing to take responsibility for that code (feel free to > completely rewrite it), that would be great. Then it can maybe be enabled > again by default.
Thanks for the info (and especially for the "impressive" :) ). Some analysis parts are not working correctly for x64 and it is hard to track the Problems down. Furthermore I assume that some parts simply don't work because of "special x64 Problems" and x64 code generation, like upper 32bit component cleaning with "and $FFFFFFFF,x" and a lot of "opsize differences" like transfer/arithmetic on 8/16/32/64bit which needs additional code for handling/post-processing and checks. For short, the asmcse part isn't working correctly and the bug is not triggered. The only thing I've seen working is a bit of value propagation for loads, which rewrites some common ref,reg moves into reg,reg moves. Anyways, I'm already working on a rewrite. But its currently unclean and more like educational prototype work ;). Instead of the Node based approach, I've started with a "something like base blocks" rewrite for Block Local and Block Global (Over Path) analysis. For Example the Label Optimizations (inverting conditions, removing trivial jmps, rewriting jump chains) are implemented now and do mostly the same as the generic Optimizations which is implemented in aopt*.pas with the difference that they run over Blocks. (The second difference is, the new approach removes more labels. Seems like a missing ...ref^.symbol.decrefs somewhere in the generic part.) For this "educational prototype" I need to rewrite parts of the analysis which is a good way for finding Problems and removing some of the minor Problems too. Second thing is, i learn a lot about the fpc Internals in between (all units with cg*/ra*/aasm*/cpu*) and can take apart some things that are mixed in the Node based Optimization. For short (again), I'm looking into it and I'm both interested in that code and willing to take responsibility for anything I'm writing. But it'll take some time to fix everything up to the cse, as always. Another Question would be: is there any documentation (except in source) about the generic target/i386 optimizer parts, assumptions about the code generation etc.? Because it may be a good thing to write some documentation in parallel (basics, like: optimization is performed on "per proc basis" without assembler block, some stuff about markers/reg allocation info/..., specific i386/x86 assumptions about register order and mapping and so on) which could help fix Problems later on and speed up the learning phase for any interested Devel. >> This is currently not complete, i didn't merge it back since there is >> still testing and review todo. But from the current point of view it >> should be rather simple to to merge the data flow analysis and the >> asmcse parts. The peephole part is another point, that should be pure >> cpu/target specific. > > I guess there are some common ones there as well, no? (especially regarding > mov's and jump chaining). > >> Like I stated above, the current approach needs further testing (fpc >> testsuite returns same result for patched and unpatched compiler with >> "make full", but there may be things missing) and review from others >> (hopefully with more knowledge about the x86_64 code generator part >> and potential optimizations). Thats why I'm attaching my current >> approach here. > > At first look, I think it's ok except for the indentation. Please use the > same style as the original code (e.g., indenting "begin" after "if ..."). > See http://wiki.freepascal.org/Coding_style for some more info. > >> TODO: There is potential for further optimizations, especially for x87 >> and 128bit Media/XOP/FM4.. but the code needs some cleanups before and >> possibly some bug fixes >> >> I'm open for any feedback, bugfixes and so on (and if it should be >> merged with i386 parts) > > Merging with i386 is fine! The whole assembler optimiser infrastructure is > also quite independent from the rest of the compiler, which makes it a very > good way to get started (it's how I rolled into FPC development in 1997, > which is in large part why the code's organisation is so bad :) Hehe, yes. At least it is more understandable than other parts and i like the concept of optimizations on "a simple, robust list" of instructions and informations per proc. By the way, I merged the *opta64 code back and introduced some new constants for simpler merging. The merge was like a second rewrite of the Parts that i changed for the x64 port, f.e. most RS_* register renaming is dropped because of the mappings. I've fixed the coding style while merging. Problem is, like i said above, the cse part is not really working. There are still 2 open questions about the deallocation of registers for procs and register rdx in special cases like 128bit result from imul. And the Peephole code is mostly unreadable due to massive {$ifdef} usage in the merge, there are common parts yes but the big Problem are "not-common" parts like the amd64 imul parts. Having said that, I'll open a bug for the patch proposal and discussions about which parts should be redone. At least the Peephole Code is usable and introduces minor improvements and a Basis for x64 sequence alternatives. (asmcse is deactivated and could be removed completely for x64, also peephole could be deactivated for now) Thanks for the Feedback and Informations about the da/cse parts. Bye, Matthias Karbe _______________________________________________ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel