Re: ignoring certain tokens in rules

2016-05-06 Thread Dominique Pellé
Jaume Ortolà i Font  wrote:

> Hi,
>
> I think Marcin talked about this idea some time ago.
>
> Sometimes tokens like quotations (or other characters) should be ignored in
> some rules. That is, the sentence should be checked as if this token is not
> present. Any idea about how could it be implemented?
>
> Alternatively, tokens like this one should be added to the the patterns:
>
> [“‘”«"']
>
> I would need to modify a few dozen rules. But perhaps this is the best
> solution: it gives more control about the rule, the suggestions, possible
> false alarms, and so on. what do you think?
>
> Regards,
> Jaume Ortolà

I have not looked in details at what the French grammar checker
Grammalecte [1] does, but I think that it checks input text
in multiple passes. In some passes, pre-processor rules eliminate
pieces of texts. For example, the pre-processor can eliminate
"useless" punctuation or locutions made of multiple words.

For example, I see in Grammalecte pre-processor rules such as:

[«»“”„"`¹²³⁴⁵⁶⁷⁸⁹⁰]+ -> *
This rule eliminates a few "useless" characters.

[(]\w+[)] -> *
This rule eliminates text is parenthesis such as (foo bar).

The important thing to keep in mind is that the sentence is checked
multiple times. For example:
* first pass checks the text as-is.
* second pass checks the text again, after applying pre-processor rules.

It seems like a good idea.

Regards
Dominique

[1] http://www.dicollecte.org/grammalecte/

--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: ignoring certain tokens in rules

2016-05-06 Thread Jaume Ortolà i Font
Hi,

In fact, the problem is a bit more complicated than I expected because the
disambiguation rules also need to ignore the tokens with quotation marks.
So it would be necessary to add a lot of  everywhere and
it would probably be unmanageable.

A more general solution:
- In AnalyzedSentece remove tokens containing quotation marks only
in getTokensWithoutWhitespace().
- Add two fields to AnalyzedTokenReadings: leftQuotationMark,
rightQuotationMark, which contain the characters adjacent to the word
(none, one side or both sides).
- Run everything as usually with the new
getTokensWithoutWhitespace (disambiguation, grammar rules, etc.).
- Retrieve leftQuotationMark, rightQuotationMark when necessary, for
example in suggestions (i.e.).

Possible difficulties:
- GenericUnpairedBracketsRule must be modified accordingly.
- Perhaps some grammar and disambiguation rules should know about the
quotation marks and new attributes could be necessary (similar to
spacebefore="yes/no").
- Whitespaces in French.
- Other unexpected troubles.

Do you think this is a good approach?

I can try to implement it, but I am not really sure if it is worthwhile
because the problems it solves are relatively rare.

Regards,
Jaume Ortolà



2016-05-05 16:22 GMT+02:00 Jaume Ortolà i Font :

> Hi,
>
> I think Marcin talked about this idea some time ago.
>
> Sometimes tokens like quotations (or other characters) should be ignored
> in some rules. That is, the sentence should be checked as if this token is
> not present. Any idea about how could it be implemented?
>
> Alternatively, tokens like this one should be added to the the patterns:
>
> [“‘”«"']
>
> I would need to modify a few dozen rules. But perhaps this is the best
> solution: it gives more control about the rule, the suggestions, possible
> false alarms, and so on. what do you think?
>
> Regards,
> Jaume Ortolà
>
>
>
--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel