Marcin Miłkowski <list-addr...@wp.pl> wrote: > W dniu 2014-05-29 10:01, Marcin Miłkowski pisze: >> W dniu 2014-05-28 21:42, Dominique Pellé pisze: >>> Hi >>> >>> Searching for >> in grammar.xml files, I see things that >>> are wrong, or at least suspicious: >>> >>> $ ack-grep --xml '>>' languagetool-language-modules/*/src >>> >>> languagetool-language-modules/de/src/main/resources/org/languagetool/rules/de/grammar.xml >>> 25390: <token negate="yes">></token> >>> 25400: <token>></token> >>> 25423: <token negate="yes">></token> >>> >>> languagetool-language-modules/en/src/main/resources/org/languagetool/rules/en/grammar.xml >>> 10243: <marker>><token postag="CD"/> >>> >>> languagetool-language-modules/ru/src/main/resources/org/languagetool/rules/ru/grammar.xml >>> 935: <!--Перед сравнительным оборотом стоит "не" или слова: >>> совсем, совершенно, почти, именно ->> запятая не ставится. >>> >>> >>> I'm surprised that tests did not pick up automatically >>> the >> inside the <marker> tags, at least in the English >>> grammar.xml. The <marker> tag should never contain text >>> but only other sub-tags. Probably this kind of errors can >>> automatically be detected. >> >> Indeed, this could be detected during validation. The only problem is >> that the <marker> tag is used to mark up simple text content inside >> <example> tags, and it's not so trivial to define XML Schema to allow no >> text content inside one tag (<pattern>), but some content inside another >> (<example>). At least, I couldn't find an easy way. Anyway, XML >> specialists are welcome to look at rules.xsd and pattern.xsd. > > OK, it turned out that it was pretty easy to set up. Now we test the > marker element correctly. I found one mistake in French rules this way, > and I fixed it.
Thanks Marcin for the change. However, the fix in the French grammar.xml is not right: Before your change it, wrong as follows: <pattern> <marker>la</marker> <marker><token>gente</token></marker> After your change, it's still wrong as follows: <pattern> <marker><token>la</token></marker> <marker><token>gente</token></marker> I don't think it makes sense to have 2 <marker>...</marker> in the same pattern. So it should instead be: <pattern> <token>la</token> <marker><token>gente</token></marker> I've just fixed that in commit 4f4d7d0e6d0708abdf44f5affb7f8d1f221e1204. Interestingly, the tests do not detect the presence of multiple tags <marker>...</marker> in the pattern. Could we perhaps also detect that? Regards Dominique ------------------------------------------------------------------------------ Time is money. Stop wasting it! Get your web API in 5 minutes. www.restlet.com/download http://p.sf.net/sfu/restlet _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel