Daniel Naber <daniel.na...@languagetool.org> wrote:

> Hi,
>
> currently, the LT rules written in XML are language-specific. Is there
> any reason for this limitation? There are some rules that could be used
> for all languages, e.g. misspellings of names, like "Linux Torvalds".
>
> Here's an idea how we could implement that:
>
> -Create a new Maven project languagetool-language-modules/global that
> has a grammar.xml file where the language-independent rules are stored.
> Rules could look like this:
>
> <rule ...>
>    <pattern>
>      <token>Linux</token>
>      <token>Torvalds</token>
>    </pattern>
>    <message>i18n:misspelled_name</message>
>    <suggestion>Linus Torvalds</suggestion>
>    ...
> </rule>
>
> 'misspelled_name' is a key in the existing translation file, so that the
> message can be translated at Transifex. Maybe if there's no translation,
> the rule shouldn't become active?
>
> -Change the dependencies so that every language depends on this new
> module
>
> -Adapt the Java code to load the rules from the new file, additionally
> to the existing rules
>
> Any ideas or comments?
>
> Regards
>   Daniel

Half the rule content would have to be customized for each language:
the <message>, the <url>, and <example>s.  And sometimes,
some languages may need to add specific exceptions in
the pattern.  So I'm not sure whether it's worth adding a feature
for this.

Having said that it would be good to know rules in some languages
which could be useful in other languages.

Regarding your example, the French grammar already has this
rule by the way:

      <rule>
        <pattern>
          <token>Linux</token>
          <token regexp="yes">Th?orvald?s?</token>
        </pattern>
        <message>Écrivez <suggestion>Linus Torvalds</suggestion> s’il
s’agit du créateur de Linux.</message>
        <url>https://fr.wikipedia.org/wiki/Linus_Torvalds</url>
        <example type="incorrect"><marker>Linux Torvalds</marker></example>
        <example type="correct">Linus Torvalds</example>
      </rule>
      <rule>
        <pattern>
          <token>Linus</token>
          <token
regexp="yes">Th?orvald?s?<exception>Torvalds</exception></token>
        </pattern>
        <message>Écrivez <suggestion>Linus Torvalds</suggestion> s’il
s’agit du créateur de Linux.</message>
        <url>https://fr.wikipedia.org/wiki/Linus_Torvalds</url>
        <example type="incorrect"><marker>Linus Thorvalds</marker></example>
        <example type="correct">Linus Torvalds</example>
      </rule>

There are many other names often misspelled that are already
in the French grammar rules that could be used in other languages.
Some examples:

- Jimmy Hendrix -> Jimi Hendrix
- Forest Gump -> Forrest Gump
- Axle Rose -> Axl Rose
- Megadeath -> Megadeth
- etc.

Feel free to copy them from French rule NOM_MAL_EPELE
into other languages.

Regards
Dominique

------------------------------------------------------------------------------
WatchGuard Dimension instantly turns raw network data into actionable 
security intelligence. It gives you real-time visual feedback on key
security issues and trends.  Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to