On Tuesday, 27 March 2012 at 09:51:07 UTC, bearophile wrote:
Dmitry Olshansky:
Speaking more of run-time version of regex, it is essentially
running a VM that executes instructions that do various kinds
of match-this, match-that. The VM dispatch code is quite slow,
the optimal _threaded_ code requires either doing it in
_assembly_ or _computed_ goto in the language. The VM
_dispatch_ takes up to 30% of time in the default matcher.
I have used computed gotos in GCC-C to implement some quite
efficient finite state machines to be used in computational
biology. I've seen 20%+ speedups compared to my alternative
switch-based implementation. So I'd like computed gotos in D
too.
While I am in favor of all enhancements which improve low-level
access, I'm very surprised by your findings by computed gotos...
the compiler I am most used to(rvct for arm)... seems proficient
in emitting jump table instructions(TBB, TBH) for thumb2... but
based on your findings I will definitely re-check the generated
asm. Could it be that the compiler "heuristics" simply is less
than optimal... and an alternative would be to force a specific
implementation with a pragma? or the recent @annotation syntax...
pragma(switch, "jumptable")
pragma(switch, "binary-search-tree")
it would have the benefit of not having to re-factor the code and
one could easily benchmark which solution is the fastest for a
different inputs...