21-Jul-2014 02:29, Walter Bright пишет:
On 7/20/2014 3:10 PM, Dmitry Olshansky wrote:
The computed goto is faster for two reasons, according to the article:

1.The switch does a bit more per iteration because of bounds checking.

Now let's consider proper implementation of thread-code interpreter.
where *code pointer points to an array of addresses. We've been
through this
before and it turns out switch is slower because of an extra load.

a) Switch does 1 load for opcode, 1 load for the jump table, 1
indirect jump to
advance
(not even counting bounds checking of the switch)

b) Threaded-code via (say) computed goto does 1 load of opcode and 1
indirect
jump, because opcode is an address already (so there is no separate
jump table).

True, but I'd like to find a way that this can be done as an optimization.

I found a way but that relies on tail-call optimization, otherwise it would overflow stack. I would rather find some way that works without -O flag.

In fact it brings another unrelated problem with Phobos: any template-heavy libraries have amazingly awful speeds w/o inlining & optimization enabled _by the client_. It should be the same with C++ though.

I'm certain that forced tail call would work just fine instead of
computed goto
for this scenario. In fact I've measured this with LDC and the results
are
promising but only work with -O2/-O3 (where tail call is optimized).
I'd gladly
dig them up if you are interested.

I'm pretty reluctant to add language features that can be done as
optimizations.

The point is - software that only works in release build is kind of hard to develop, even more so with libraries. Thus I'm in opposition to labeling such things as optimization when they, in fact, change semantics.

--
Dmitry Olshansky

Reply via email to