In article <[EMAIL PROTECTED]>, Miles Bader <[EMAIL PROTECTED]> writes:
> On Mon, 28 Mar 2005 09:47:09 +0900 (JST), Kenichi Handa <[EMAIL PROTECTED]> > wrote: >> To handle the regular expression "\\b" and "\\B" correctly >> for Thai, we need a bigger change in regex.c. For the >> moment, I have no idea how to do that. > Current extensions to "word syntax", using `word-separating-categories' > etc., seem to do the correct thing with regexps.[*] Perhaps some > extension to that mechanism would work. > For instance, what if entries in `word-separating-categories' could have an > optional predicate function -- in addition to the current (CAT1 . CAT2) > format, allow (CAT1 CAT2 PREDICATE-FUN), and only consider the entry to > match if PREDICATE-FUN fun (with some apropriate args) also returns true? The problem is that the innermost function re_match_2_internal doesn't know about the original buffer or Lisp string. So, to make PREDICATE-FUN work, we must generate a Lisp string each time and that will be extemely slow. And first of all, is re_match_2_internal a safe place to call a Lisp function? > [*] I was surprised that this is true, and I don't understand why from > my quick look at regex.c :-/ ... But my simple tests seem to show > that it does really work. E.g., I can add '(?C . ?C) to > `word-separating-categories', and then a regexp search will suddenly > start considering every single kanji character as a standalone word. I spent fairy long time to make it work. :-p re_match_2_internal calls the macro WORD_BOUNDARY_P at proper places. It is also used in scan_words (syntax.c). --- Ken'ichi HANDA [EMAIL PROTECTED] _______________________________________________ Emacs-devel mailing list Emacs-devel@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-devel