Daniel Naber <daniel.na...@languagetool.org> wrote: > On 2015-10-07 06:41, Dominique Pellé wrote: > > Hi Dominique, > > thanks for your feedback. > >> 1) How do I highlight only a subset of the match? Trying the above >> rule, I see this: > > That's not yet possible, but I like the idea of a 'marker' attribute. > I'll add that to my TODO list.
Good. >> 2) Is there always an implicit word boundary at the beginning or end >> of <regexp>? > > There's no implicit boundary. How does Grammalecte deal with this? I'm not sure. I see rules like this with explicit \b: __typo__ \betc([.][.][.]|…) -> etc. # Un seul point après « etc. » On the other hand, most rules are without \b like this: __tu__ science fiction -> science-fiction # Il manque un trait d’union. Perhaps Oliver R. in CC (author of Grammalecte) can comment on whether there is an implicit \b at beginning and end of regexps. Is the format of Grammalecte rules documented? I think that the best for LT would be to use \b implicitly at beginning and end of the regexp, but have a option to disable it with <regexp word_boundary="no"> which will rarely need to be used. I can't think of a good short name for that option. >> I wonder whether there is a performance impact. > > I just ran a performance test and changing 320 German rules to regex > makes checking ~10% slower. For me, 10% is not a value I care about, > especially as other languages like English are much slower anyway. OK. 10% isn't that small in my opinion. I'll probably end up using <regexp ...> only when it helps to reduce >= 2 rules into 1 rule, mostly because it makes grammar.xml more maintainable. Maybe having less rules will then even compensate the slowdown due to regexp matching on sentences. The slow down could depend on the text you check: I'd expect it to be worse on very long phrases if Java DFA regexp engine is worse than O(n) for some regexp, where n is the number of char in the matched sentence. But if regexp are simple enough, they will not trigger complexity worse than O(n) I suspect. Regards Dominique ------------------------------------------------------------------------------ Full-scale, agent-less Infrastructure Monitoring from a single dashboard Integrate with 40+ ManageEngine ITSM Solutions for complete visibility Physical-Virtual-Cloud Infrastructure monitoring from one console Real user monitoring with APM Insights and performance trend reports Learn More http://pubads.g.doubleclick.net/gampad/clk?id=247754911&iu=/4140 _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel