In article <[EMAIL PROTECTED]>, Miles Bader <[EMAIL PROTECTED]> writes:

> On Mon, 28 Mar 2005 09:47:09 +0900 (JST), Kenichi Handa <[EMAIL PROTECTED]> 
> wrote:
>>  To handle the regular expression "\\b" and "\\B" correctly
>>  for Thai, we need a bigger change in regex.c.  For the
>>  moment, I have no idea how to do that.

> Current extensions to "word syntax", using `word-separating-categories'
> etc., seem to do the correct thing with regexps.[*]  Perhaps some
> extension to that mechanism would work.

> For instance, what if entries in `word-separating-categories' could have an
> optional predicate function -- in addition to the current (CAT1 . CAT2)
> format, allow (CAT1 CAT2 PREDICATE-FUN), and only consider the entry to
> match if PREDICATE-FUN fun (with some apropriate args) also returns true?

The problem is that the innermost function
re_match_2_internal doesn't know about the original buffer
or Lisp string.  So, to make PREDICATE-FUN work, we must
generate a Lisp string each time and that will be extemely
slow.  And first of all, is re_match_2_internal a safe place
to call a Lisp function?

> [*] I was surprised that this is true, and I don't understand why from
>     my quick look at regex.c :-/ ... But my simple tests seem to show
>     that it does really work.  E.g., I can add '(?C . ?C) to
>     `word-separating-categories', and then a regexp search will suddenly
>     start considering every single kanji character as a standalone word.

I spent fairy long time to make it work. :-p
re_match_2_internal calls the macro WORD_BOUNDARY_P at
proper places.  It is also used in scan_words (syntax.c).

---
Ken'ichi HANDA
[EMAIL PROTECTED]


_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-devel

Reply via email to