Daniel Naber <daniel.na...@languagetool.org> wrote:

> On 2015-10-07 06:41, Dominique Pellé wrote:
>
> Hi Dominique,
>
> thanks for your feedback.
>
>> 1) How do I highlight only a subset of the match?   Trying the above
>> rule, I see this:
>
> That's not yet possible, but I like the idea of a 'marker' attribute.
> I'll add that to my TODO list.

Good.

>> 2) Is there always an implicit word boundary at the beginning or end
>> of <regexp>?
>
> There's no implicit boundary. How does Grammalecte deal with this?

I'm not sure.

I see rules like this with explicit \b:

__typo__  \betc([.][.][.]|…) -> etc.         # Un seul point après « etc. »

On the other hand, most rules are without \b like this:

__tu__  science fiction -> science-fiction   # Il manque un trait d’union.

Perhaps Oliver R.  in CC (author of Grammalecte) can comment on
whether there is an implicit \b at beginning and end of regexps.
Is the format of Grammalecte rules documented?

I think that the best for LT would be to use \b implicitly at
beginning and end of the regexp, but have a option to disable it
with <regexp word_boundary="no"> which will rarely need to be
used. I can't think of a good short name for that option.

>> I wonder whether there is a performance impact.
>
> I just ran a performance test and changing 320 German rules to regex
> makes checking ~10% slower. For me, 10% is not a value I care about,
> especially as other languages like English are much slower anyway.

OK. 10% isn't that small in my opinion.  I'll probably end
up using <regexp ...> only when it helps to reduce >= 2 rules
into 1 rule, mostly because it makes grammar.xml more
maintainable. Maybe having less rules will then even
compensate the slowdown due to regexp matching on
sentences.

The slow down could depend on the text you check: I'd
expect it to be worse on very long phrases if Java DFA regexp
engine is worse than O(n) for some regexp, where n is the number
of char in the matched sentence. But if regexp are simple enough,
they will not trigger complexity worse than O(n) I suspect.

Regards
Dominique

------------------------------------------------------------------------------
Full-scale, agent-less Infrastructure Monitoring from a single dashboard
Integrate with 40+ ManageEngine ITSM Solutions for complete visibility
Physical-Virtual-Cloud Infrastructure monitoring from one console
Real user monitoring with APM Insights and performance trend reports 
Learn More http://pubads.g.doubleclick.net/gampad/clk?id=247754911&iu=/4140
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to