In order to replace VRP, we need an alternative for the jump threader embedded within it. Currently, this threader is a forward threader client that uses ASSERT_EXPRs and the avail/const framework to resolve statements along a path.

As I have mentioned in the past weeks, I am proposing a hybrid replacement, where we still use the forward threader, but with the path solver used by the backward threader. This provides an incremental path to remove the forward threader with minimal disruption.

As a reminder, the reason we can't just use the backward threader for a VRP-threader replacement right now is because the forward and backward threaders have different cost models and block copiers that have not been merged. I'd like to address these differences for the next release. This, coupled with support for floats in ranger, would allow us to nuke the forward threader entirely. But for now, a more gradual approach is needed.

I have patches to the path solver that address all the shortcomings it had in my initial rewrite. That is, the ability to use the ranger and the relation oracle to fully resolve ranges and relations not known within the path. With this enhanced solver, we are able to get everything the forward threader can get (both VRP and DOM clients, with the exception of floats for DOM). This path solver can then be used either with the backward threader (disabled by default), or with with the forward threader in the hybrid approach I propose for VRP.

I'd like to discuss how I tested this, what is missing, and alternatives on going forward.

SUMMARY
=======

The hybrid threader gets 14.5% more jump threads than the legacy code, but most of these threads are at the expense of other threading passes. The more it gets, the less DOM and the backward threader get. That being said, there is a total improvement of 0.87% jump threads in the total compilation.

This comes with no runtime penalty. In our callgrind testing harness derived from the .ii files from a bootstrap, I measured a 0.62% performance drop, well within the noise of the tool.

However, there is a 1.5% performance penalty for splitting the VRP threader from outside of VRP (for either hybrid or legacy). I would prefer divorcing embedded jump threading passes from their carriers, but if others disagree, we could piggy back on the ranger used in the upcoming VRP replacement (evrp has a ranger we could share).

TESTING
=======

The presence of ASSERT_EXPRs made it difficult to do a clean comparison, as ranger obviously doesn't use these. What I did was move the VRP threader into its own pass (regenerating ASSSERT_EXPRs and its environment), and then run my hybrid threader iteratively until no more changes (excluding ping-pongs). Then I ran the legacy code, and anything it could find, was something worth investigating.

BTW, the reason for the iterative approach, is because any threading pass creates opportunities for subsequent threading passes.

MISSING CASES
=============

There were a few things we missed, which can be summarized in broad categories:

a) A few missing range-op entries for things like RROTATE_EXPR. These are trivial to address.

b) Some paths which the solver could clearly solve, but it never got the chance, because of limitations in the path discovery code in the forward threader.

I fixed a few of these (upcoming patches), but I mostly avoided fixing the core forward threader code, as it would incur too much work for very little return. Besides, my plan is to eventually nuke it.

One example are paths whose first block ends in a conditional, but whose second is empty. These are usually empty loop pre-headers, which are not empty in legacy mode because the ASSERT_EXPR mechanism has put asserts in there. However, since the ranger uses the pristine IL, these blocks are empty, which throws off the threader. In practice, this didn't matter, because since the CFG had been cleaned up, these empty blocks were predominantly loop pre-headers which the threader was trying to thread through, crossing loop boundaries in the process. As has been discussed earlier, we shouldn't be threading through this.

c) The path solver returns UNDEFINED for unreachable paths, so we avoid threading them altogether. However, the legacy code is perfectly happy to give answers and thread through the paths.

d) I saw an odd missed propagation/recalculation here and there, where ranger returned varying, when it could've done better. This was very rare.

e) Finally, conditionals when either operand is a MEM_REF are not handled. Interestingly, this was uncommon as most everything had been simplified to an SSA.

As the numbers show, the above is probably noise. The only thing worth addressing may be the MEM_REF business. If this becomes a problem, it is precisely what we designed the pointer equivalence tracking we use in evrp. It could be easily adapted.

PATH FORWARD
============

I have various patchsets to the path solver, the forward threader, and VRP to make all this happen, but would like to get feedback on how to proceed.

I suggest we replace the VRP threader before replacing VRP[12] with evrp[23] to minimize the disruption. The hybrid threader can easily run after current VRP[12] as it stands.

I lean towards putting this post-VRP threader in its own pass, instead of keeping it embedded in VRP. After all, this was one of the original motivations for the ranger work, to rip out all the sub-passes that had accumulated inside VRP over the years. But I do understand if the 1.5% penalty is hard to swallow. In which case, we could move the hybrid threader to its own file, but sharing the ranger used by evrp[23] when the time comes.

That being said, when the threaders' cost models are merged next year, we could shuffle passes easier. There's no reason to thread before *and* after VRP, when the backward threader should be able to handle everything. We should be able to get back any lost time, and then some, by reducing the amount of jump threading passes.

One more thing, would it be preferred that I include a --param=vrp-threader=[legacy|hybrid] parameter along with the patch? I have such thing in my tree, but last time around, the ability to run in legacy was mostly a distraction that had to be removed immediately after.

I will be contributing/pushing various patches over the next few days that are improvements orthogonal to this work (for example, improvements to the forward threader and the path solver). But I'd like some feedback on the VRP threader bits before I post those.

Thanks.
Aldy

Reply via email to