Daniel Naber <daniel.na...@languagetool.org> wrote:

> On 2015-09-05 22:53, Dominique Pellé wrote:
>
>> It is similar to what Daniel wrote earlier as well:
>>
>> <regex>a (plein temps|chaque fois|rude épreuve|vol
>> d’oiseau)</regex>
>
> So instead of <pattern><token>...</token></pattern> we would have
> <regex>...</regex>, but it couldn't be combined with <token>, is that
> right?
>
> We'll need to decide if <regex>a plain temps</regex> implies \b at the
> start and end of the regex, i.e. whether it would also match "la plain
> temps" or not. If it implies \b and you want to match a word ending in
> 'a', you'd use <regex>.*a</regex> (need to test if "\b.*a" actually
> works), otherwise <regex>a\b</regex>.
>
> Regards
>   Daniel


It would be more powerful if <regex>...</regexp> is inside a <pattern>
so we can combine the best of <token>...</token> and <regexp>...</regexp>,
like this for example:

<pattern>
  <token postag="[NJ] .*" postag_regex="yes" inflected="yes">foobar</token>
  <regexp>xxx.*yyy|abc[def]</regexp>
</pattern>

In above example, the <regexp> would match only right after the first
token foobar
rather than on the whole sentence.

If <regexp> cannot be combined with <token> then we lose the ability to have
inflected="yes", postag="...", etc.

But I wonder how expensive this is.  I assume that tokenization makes matching
faster than applying regexp to whole sentences. If it slows down LT a lot (to be
confirmed), then we should not abuse the <regexp>... </regexp> feature.

Regards
Dominique

------------------------------------------------------------------------------
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to