Daniel Naber <daniel.na...@languagetool.org> wrote: > On 2015-09-05 22:53, Dominique Pellé wrote: > >> It is similar to what Daniel wrote earlier as well: >> >> <regex>a (plein temps|chaque fois|rude épreuve|vol >> d’oiseau)</regex> > > So instead of <pattern><token>...</token></pattern> we would have > <regex>...</regex>, but it couldn't be combined with <token>, is that > right? > > We'll need to decide if <regex>a plain temps</regex> implies \b at the > start and end of the regex, i.e. whether it would also match "la plain > temps" or not. If it implies \b and you want to match a word ending in > 'a', you'd use <regex>.*a</regex> (need to test if "\b.*a" actually > works), otherwise <regex>a\b</regex>. > > Regards > Daniel
It would be more powerful if <regex>...</regexp> is inside a <pattern> so we can combine the best of <token>...</token> and <regexp>...</regexp>, like this for example: <pattern> <token postag="[NJ] .*" postag_regex="yes" inflected="yes">foobar</token> <regexp>xxx.*yyy|abc[def]</regexp> </pattern> In above example, the <regexp> would match only right after the first token foobar rather than on the whole sentence. If <regexp> cannot be combined with <token> then we lose the ability to have inflected="yes", postag="...", etc. But I wonder how expensive this is. I assume that tokenization makes matching faster than applying regexp to whole sentences. If it slows down LT a lot (to be confirmed), then we should not abuse the <regexp>... </regexp> feature. Regards Dominique ------------------------------------------------------------------------------ _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel