Hi,

I have created my first rule. It somehow works, but its behaviour is weird. The rule should find a phrase consisting of 'koho' + inflected byť + some arbitrary words + noun (inflected). The weird thing is that segments of different length are marked, even though they are grammatically equivalent (marked segment is in parentheses):

Nevedel, {koho je to nové auto}. (He did not know whose is the new car)
Nevedel, {koho je to modré} auto.
Nové (new) and modré (blue) are both adjectives in the same form (should be skipped).

Nevedel, {koho je to nové a} rýchle auto.  (new and fast)
Nevedel, {koho je to nové i} rýchle auto.
Nevedel, {koho je to nové aj rýchle auto}.

Here, 'a', 'i' and 'aj' are conjunctions, all meaning 'and' (should be skipped).

To understand the problem, I tried the command line version of LT with the -t switch. According to its output, it seems to me that tagging is incorrect.

For example, for the 'aj' conjunction I get
aj[aj/J,aj/O,aj/T]
which is correct. However, for the 'a' conjunction I get
a[a/J,a/O,a/Q,a/SUnp1,a/SUnp2,a/SUnp3,a/SUnp4,a/SUnp5,a/SUnp6,a/SUnp7,a/SUns1,a/SUns2,a/SUns3,a/SUns4,a/SUns5,a/SUns6,a/SUns7,a/T,a/W,as/W]
So 'a' is tagged also as a noun (the S tag), which explains why marking in
Nevedel, {koho je to nové a} rýchle auto
stops after 'a'.

Similar happens with adjectives (the 'A' tag), which are recognized also as nouns (the 'S' tag)
modré[modré/SAns1,modré/SAns4,modré/SAns5,modrá/SAfp1,modrá/SAfp4,modrá/SAfp5,modrý/AAfp1x,modrý/AAfp4x,modrý/AAfp5x,modrý/AAip1x,modrý/AAip4x,modrý/AAip5x,modrý/AAnp1x,modrý/AAnp4x,modrý/AAnp5x,modrý/AAns1x,modrý/AAns4x,modrý/AAns5x]

This is perhaps a problem outside LT. If it is so, I perhaps should talk to the author of the tagging tool. Do you have contact?

Thanks
Milos

The rule:
            <rule>
                <pattern>
                    <token>koho</token>
            <token skip="-1" inflected="yes">byť</token>
<token postag="S...." postag_regexp="yes"></token> # find a noun here
                </pattern>
                <message>
            Zámeno 'koho' nahraďte zámenom 'čí':
<suggestion><match no="3" postag="S.(.)(.)(.)" postag_replace="PA$1$2$3">čí</match> <match no="2" include_skipped="all"/> <match no="3"/></suggestion>.
                </message>
<short>Možná gramatická chyba. Zámeno 'koho' možno treba nahradiť zámenom 'čí' (pravidlo 3)</short> <example correction="čí bol ten vysoký dom" type="incorrect">
                  Nevedel, <marker>koho bol ten vysoký dom</marker>.
                </example>
                <example type="correct">
                  Nevedel, čí bol ten vysoký dom.
                </example>
            </rule>

-- email & jabber: [email protected]
------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Languagetool-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to