Re: fmatch

stefan Fri, 07 May 2010 08:05:50 -0700

On Friday 07 May 2010 01:59:13 pm Ludovic Courtès wrote:
> Hi Stefan,
> 
> stefan <stefan.ta...@spray.se> writes:
> > I've been experimenting lately with an inline match construct, very much
> > like using compiled regexps.
> 
> Sounds interesting.
> 
> > That is I created a tiny VM that was targeted to do matching. to show
> > it consider
> >
> > guile> (def f ((x y 'a 'b) (+ x y)))
 
> (Such a ‘def’ construct would be less convenient than ‘match’ IMO.)


You have the full match, match-lambda etc of cause if you need it. I just
use a litle syntactic sugar.

> > large patterns yield a speedup of 15 times acording to my tests compared
> > with (ice-9 match).
> >
> > I used guile-1.9.10. Does this release have a lot of checks compiled in
> > so that the comparison is unfair?
> 
> All list/pair accessors type-check their arguments (this has always been
> the case with Guile so that Scheme code cannot yield to segfaults and
> the like.)

usually the generated match code is somehting like
(if (pair? x)
    (if (equal? x data)
        (let a34 (cdr x) ...)
        (goto-next))
    (goto-next))

And the checks are already there. So for this isolated system in should
be fine to skip the check. But I don't think it explains the speed 
difference.
        
            

> > Anyway, I will try to tweak the code even further somthing along
> >
> > (object-ref 1)
> > (fast-match)
> > (br L1)
> > (br L2)
> > (br L3)
> > ....
> > (br Ln)
> >
> > Here the fast-match routine can FETCH the br commands and issue them
> > directly and hence one can have one compiled pattern in stead of one for
> > each row in the matcher.
> 
> I don’t quite understand how it differs from what happens without a
> ‘fast-match’ instruction.  Can you show the code for ‘fast-match’?

The core of the matcher is a loop with a set of if's arranged so that in the 
mean it will do 2-3 checks and then find the instruction, so no named labels.

the core of the code look like,
  #define M_CONTINUE {pat ++; continue;}
  while(1)
    {    
      //printf("match-instruction (*)> %d\n",((int) Ix) >> 2);fflush(stdout);
      if(SCM_UNLIKELY(Ix == i_cons))
        {
          if(SCM_CONSP(x))
            {
              PUSH(SCM_CDR(x));
              x = SCM_CAR(x);
              M_CONTINUE;
            }
          goto return_false;
        }
          
      if(SCM_UNLIKELY(Ix == i_eq))
        {
          pat ++;
          if(*pat == x) M_CONTINUE;
          goto return_false;
        }
          
      if(SCM_UNLIKELY(Ix == i_var))
        {
          pat ++;
          LOCAL_SET((SCM_UNPACK(*pat) >> 2) , x);
          M_CONTINUE;
        }
      
      if(SCM_UNLIKELY(Ix == i_pop))
        {
          POP(x);
          M_CONTINUE;
        }

        ...
        
And we see the most used instructions are int the top 4 checks. I don't know 
how efficient gcc is to compile the named label method used in the guile vm. 
But this while loop is not exotic so that one should expect good assembler 
from 
gcc. Also an SCM array is used which might explain some speedup. The match 
construct itself generats a more wordier vm instruction list as well due to 
the general nature of guile vm.

To see the compilation
((a b) a)

translates to
<cons> <var> a.id <pop> <cons> <var> b.id <pop> <eq> '() <end>

8 instructions!!

(if (pair? x1)
    (let ((a  (car x1))
          (x2 (cdr x1)))
      (if (pair? x2)
          (let ((b (car x2)))
            (if (null? x2)
                ...

a 15-16 instructions, but some inneficiency in the match algoritm maby
means a factor of 3 between them, also it is brancy which seams to cost
extra cycles.

but 15x difference?

anyway, I'm experimenting to do a directly dispatch to the correct code
with 

if(SCM_LIKELY(Ix == i_end)) // <end> jmp-addr
  {
  gp_end:
    //printf("(sp: %d   ==  sp_save: %d\n",sp,sp_save);fflush(stdout);
    int ijump = (*(pat+1)) >> 2;
    if(SCM_UNLIKELY(ijump == 0))
      {       
        scm_t_int32 offset;
        //ip: <obj> compiled.id <br> a1 b1 c1 <br> a2 b2 c2 ...
        //                      | row = 0    | row = 1    | ...
        ip += 3 + row * 4; 
        FETCH_OFFSET (offset);
        ip += offset;
        //Store the offset for a fast lookup of the adress!!, 
        *(pat+1) = SCM_PACK(((scm_t_bits) ((offset + 3 * row * 4) << 2)) + 2); 
        if (offset < 0)
          VM_HANDLE_INTERRUPTS;
      }
    else
      ip += (*(pat + 1)) >> 2;
    NEXT;
  }

Modeilling the function data with br and obect-ref will make it possible
quickly test out this idea. A in a finished solution you need to make it less
hacky in it's nature.


I also have similar code for usage in prolog like matching e.g. unfication +
 backtracking I would expect at least the same speedup here. Note that for 
this application a pure unwinding of the logic, similar to the match macro 
will yield very bloated code and going for a targeted virtual machine will 
make 
for lean and efficient code.

I hope that in the end a prolog implementation using this approach would be 
3-4
times slower that what you have with a more targeted platform like gnu prolog.

So in a sense this is a test of what you want. Compare the speed difference of 
using c-based or scheme based solution. Still I expect the macth construct to 
be easier to hack and improve uppon and at some point a direct match version 
will win because you can compile it naitively.

Hope the fogg clears a little
Cheers

Stefan

Re: fmatch

Reply via email to