Thanks Marcin

I'll take a look at the Chunker first and then will see if grammar.xml
is enough for what I am trying to do. The thing is that I shooting for
two targets: agreeing nouns and adjectives (that should have
overlapping case/gender tags) and agreeing prepositions/verbs with
noun/adjective chunks (and tags are different, e.g. Genetive tag on a
noun is v_rod, the tag string for requiring Genetive or Dative in
preposition is rv_rod:rv_dav). So even if the first one can be done
with grammar.xml I am not sure if the second one can also be easily
done that way.

Yes, we have real POS dictionary for Ukrainian (since 2.2 I believe),
it's based on spell-uk project and has about 140k lemmas. It has some
problem areas (mostly around Dative and maybe some other cases) but so
far I didn't hit anything major by using it in LT.

I took a quick look at UGTagger and I am not sure we can benefit much
from UGTag. Most of what UGTag is doing also done by LT, plus they are
a hardcoded for Ukrainian so it won't be easy to reuse things in LT.
They might have better disambiguator (Ukrainian for LT pretty much
does not have any) but I am not sure how easy it is to port that.
Please correct me if you know more than I do and I am wrong.
We also can't use their POS dictionary (which is probably better that
what we have as it's developed by official institution) as it's
copyrighted.
I also talked to a person who has contacts with the authors and he was
not sure if they would be much interested in collaboration. But
actually I will send them quick email to see if it's not true.

Andriy

2013/11/20 Marcin Miłkowski <[email protected]>:
> Hi Andriy,
>
> W dniu 2013-11-19 22:47, Andriy Rysin pisze:
>> I am thinking to add rules to Ukrainian that would check if related
>> words agree on case/gender etc. There are several primary cases for
>> this:
>> 1) having adjective and noun have the same case, gender etc
>> 2) having noun's/adjective's gender or plural form to match that of the verb
>> 3) having adjective and/or noun to be in a right case if it follows
>> the preposition or a verb that requires some particular case
>>
>> I was thinking about 3) for a bit and it looks like it'll be too hard
>> to implement this in grammar.xml: we have 7 cases in Ukrainian and
>> some prepositions may allow several cases for following
>> nouns/adjectives and some nouns don't change (currently I just mark
>> them as such in the dictionary instead of exploding same word 7 times
>> which may be more correct way to go).
>>
>> So I was going to take a shot at doing this in Jave (and I guess along
>> the way I'll see if it makes sense follow similar pattern for 1 and
>> 2). But before I started I wanted to double check that there's no
>> good/common/existing way of doing things like that.
>
> You might look at unification. Basically, we are able to find
> non-agreeing words very easily if they are POS-tagged appropriately.
>
> http://wiki.languagetool.org/using-unification
>
> Note however that in Polish, that would generate zillions of false
> alarms as there might be several noun phrases in different grammatical
> cases, and such a check would also match all boundaries of phrases (last
> word of the first phrase and the first word of the next phrase). You
> should also mark up the phrases ("chunks") first. I want to add marking
> of chunks in the disambiguator but I don't have time currently to do
> that. But it is definitely possible to mark chunks in Polish using
> unification (with a few additions).
>
> Speaking of which, is there a real POS dictionary for Ukrainian in LT? I
> thought you have only a lemmatiser. We might integrate UGTagger after all?
>
> Regards,
> Marcin
>
> ------------------------------------------------------------------------------
> Shape the Mobile Experience: Free Subscription
> Software experts and developers: Be at the forefront of tech innovation.
> Intel(R) Software Adrenaline delivers strategic insight and game-changing
> conversations that shape the rapidly evolving mobile landscape. Sign up now.
> http://pubads.g.doubleclick.net/gampad/clk?id=63431311&iu=/4140/ostg.clktrk
> _______________________________________________
> Languagetool-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel

------------------------------------------------------------------------------
Shape the Mobile Experience: Free Subscription
Software experts and developers: Be at the forefront of tech innovation.
Intel(R) Software Adrenaline delivers strategic insight and game-changing 
conversations that shape the rapidly evolving mobile landscape. Sign up now. 
http://pubads.g.doubleclick.net/gampad/clk?id=63431311&iu=/4140/ostg.clktrk
_______________________________________________
Languagetool-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to