Marcin Miłkowski <list-addr...@wp.pl> wrote:

> W dniu 2014-05-29 10:01, Marcin Miłkowski pisze:
>> W dniu 2014-05-28 21:42, Dominique Pellé pisze:
>>> Hi
>>>
>>> Searching for >> in grammar.xml files, I see things that
>>>    are wrong, or at least suspicious:
>>>
>>> $ ack-grep --xml '>>' languagetool-language-modules/*/src
>>>
>>> languagetool-language-modules/de/src/main/resources/org/languagetool/rules/de/grammar.xml
>>> 25390:                <token negate="yes">></token>
>>> 25400:                <token>></token>
>>> 25423:                <token negate="yes">></token>
>>>
>>> languagetool-language-modules/en/src/main/resources/org/languagetool/rules/en/grammar.xml
>>> 10243:                <marker>><token postag="CD"/>
>>>
>>> languagetool-language-modules/ru/src/main/resources/org/languagetool/rules/ru/grammar.xml
>>> 935:            <!--Перед сравнительным оборотом стоит "не" или слова:
>>> совсем, совершенно, почти, именно  ->> запятая не ставится.
>>>
>>>
>>> I'm surprised that tests did not pick up automatically
>>> the >> inside the <marker> tags, at least in the English
>>> grammar.xml.  The <marker> tag should never contain text
>>> but only other sub-tags.  Probably this kind of errors can
>>> automatically be detected.
>>
>> Indeed, this could be detected during validation. The only problem is
>> that the <marker> tag is used to mark up simple text content inside
>> <example> tags, and it's not so trivial to define XML Schema to allow no
>> text content inside one tag (<pattern>), but some content inside another
>> (<example>). At least, I couldn't find an easy way. Anyway, XML
>> specialists are welcome to look at rules.xsd and pattern.xsd.
>
> OK, it turned out that it was pretty easy to set up. Now we test the
> marker element correctly. I found one mistake in French rules this way,
> and I fixed it.


Thanks Marcin for the change.
However, the fix in the French grammar.xml is not right:

Before your change it, wrong as follows:

       <pattern>
          <marker>la</marker>
          <marker><token>gente</token></marker>

After your change, it's still wrong as follows:

       <pattern>
          <marker><token>la</token></marker>
          <marker><token>gente</token></marker>

I don't think it makes sense to have 2 <marker>...</marker>
in the same pattern. So it should instead be:

       <pattern>
          <token>la</token>
          <marker><token>gente</token></marker>

I've just fixed that in commit 4f4d7d0e6d0708abdf44f5affb7f8d1f221e1204.

Interestingly, the tests do not detect the presence
of multiple tags <marker>...</marker> in the pattern.
Could we perhaps also detect that?

Regards
Dominique

------------------------------------------------------------------------------
Time is money. Stop wasting it! Get your web API in 5 minutes.
www.restlet.com/download
http://p.sf.net/sfu/restlet
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to