On Friday 07 May 2010 01:59:13 pm Ludovic Courtès wrote: > Hi Stefan, > > stefan <stefan.ta...@spray.se> writes: > > I've been experimenting lately with an inline match construct, very much > > like using compiled regexps. > > Sounds interesting. > > > That is I created a tiny VM that was targeted to do matching. to show > > it consider > > > > guile> (def f ((x y 'a 'b) (+ x y))) > (Such a ‘def’ construct would be less convenient than ‘match’ IMO.)
You have the full match, match-lambda etc of cause if you need it. I just use a litle syntactic sugar. > > large patterns yield a speedup of 15 times acording to my tests compared > > with (ice-9 match). > > > > I used guile-1.9.10. Does this release have a lot of checks compiled in > > so that the comparison is unfair? > > All list/pair accessors type-check their arguments (this has always been > the case with Guile so that Scheme code cannot yield to segfaults and > the like.) usually the generated match code is somehting like (if (pair? x) (if (equal? x data) (let a34 (cdr x) ...) (goto-next)) (goto-next)) And the checks are already there. So for this isolated system in should be fine to skip the check. But I don't think it explains the speed difference. > > Anyway, I will try to tweak the code even further somthing along > > > > (object-ref 1) > > (fast-match) > > (br L1) > > (br L2) > > (br L3) > > .... > > (br Ln) > > > > Here the fast-match routine can FETCH the br commands and issue them > > directly and hence one can have one compiled pattern in stead of one for > > each row in the matcher. > > I don’t quite understand how it differs from what happens without a > ‘fast-match’ instruction. Can you show the code for ‘fast-match’? The core of the matcher is a loop with a set of if's arranged so that in the mean it will do 2-3 checks and then find the instruction, so no named labels. the core of the code look like, #define M_CONTINUE {pat ++; continue;} while(1) { //printf("match-instruction (*)> %d\n",((int) Ix) >> 2);fflush(stdout); if(SCM_UNLIKELY(Ix == i_cons)) { if(SCM_CONSP(x)) { PUSH(SCM_CDR(x)); x = SCM_CAR(x); M_CONTINUE; } goto return_false; } if(SCM_UNLIKELY(Ix == i_eq)) { pat ++; if(*pat == x) M_CONTINUE; goto return_false; } if(SCM_UNLIKELY(Ix == i_var)) { pat ++; LOCAL_SET((SCM_UNPACK(*pat) >> 2) , x); M_CONTINUE; } if(SCM_UNLIKELY(Ix == i_pop)) { POP(x); M_CONTINUE; } ... And we see the most used instructions are int the top 4 checks. I don't know how efficient gcc is to compile the named label method used in the guile vm. But this while loop is not exotic so that one should expect good assembler from gcc. Also an SCM array is used which might explain some speedup. The match construct itself generats a more wordier vm instruction list as well due to the general nature of guile vm. To see the compilation ((a b) a) translates to <cons> <var> a.id <pop> <cons> <var> b.id <pop> <eq> '() <end> 8 instructions!! (if (pair? x1) (let ((a (car x1)) (x2 (cdr x1))) (if (pair? x2) (let ((b (car x2))) (if (null? x2) ... a 15-16 instructions, but some inneficiency in the match algoritm maby means a factor of 3 between them, also it is brancy which seams to cost extra cycles. but 15x difference? anyway, I'm experimenting to do a directly dispatch to the correct code with if(SCM_LIKELY(Ix == i_end)) // <end> jmp-addr { gp_end: //printf("(sp: %d == sp_save: %d\n",sp,sp_save);fflush(stdout); int ijump = (*(pat+1)) >> 2; if(SCM_UNLIKELY(ijump == 0)) { scm_t_int32 offset; //ip: <obj> compiled.id <br> a1 b1 c1 <br> a2 b2 c2 ... // | row = 0 | row = 1 | ... ip += 3 + row * 4; FETCH_OFFSET (offset); ip += offset; //Store the offset for a fast lookup of the adress!!, *(pat+1) = SCM_PACK(((scm_t_bits) ((offset + 3 * row * 4) << 2)) + 2); if (offset < 0) VM_HANDLE_INTERRUPTS; } else ip += (*(pat + 1)) >> 2; NEXT; } Modeilling the function data with br and obect-ref will make it possible quickly test out this idea. A in a finished solution you need to make it less hacky in it's nature. I also have similar code for usage in prolog like matching e.g. unfication + backtracking I would expect at least the same speedup here. Note that for this application a pure unwinding of the logic, similar to the match macro will yield very bloated code and going for a targeted virtual machine will make for lean and efficient code. I hope that in the end a prolog implementation using this approach would be 3-4 times slower that what you have with a more targeted platform like gnu prolog. So in a sense this is a test of what you want. Compare the speed difference of using c-based or scheme based solution. Still I expect the macth construct to be easier to hack and improve uppon and at some point a direct match version will win because you can compile it naitively. Hope the fogg clears a little Cheers Stefan