Thanks Marcin I'll take a look at the Chunker first and then will see if grammar.xml is enough for what I am trying to do. The thing is that I shooting for two targets: agreeing nouns and adjectives (that should have overlapping case/gender tags) and agreeing prepositions/verbs with noun/adjective chunks (and tags are different, e.g. Genetive tag on a noun is v_rod, the tag string for requiring Genetive or Dative in preposition is rv_rod:rv_dav). So even if the first one can be done with grammar.xml I am not sure if the second one can also be easily done that way.
Yes, we have real POS dictionary for Ukrainian (since 2.2 I believe), it's based on spell-uk project and has about 140k lemmas. It has some problem areas (mostly around Dative and maybe some other cases) but so far I didn't hit anything major by using it in LT. I took a quick look at UGTagger and I am not sure we can benefit much from UGTag. Most of what UGTag is doing also done by LT, plus they are a hardcoded for Ukrainian so it won't be easy to reuse things in LT. They might have better disambiguator (Ukrainian for LT pretty much does not have any) but I am not sure how easy it is to port that. Please correct me if you know more than I do and I am wrong. We also can't use their POS dictionary (which is probably better that what we have as it's developed by official institution) as it's copyrighted. I also talked to a person who has contacts with the authors and he was not sure if they would be much interested in collaboration. But actually I will send them quick email to see if it's not true. Andriy 2013/11/20 Marcin Miłkowski <[email protected]>: > Hi Andriy, > > W dniu 2013-11-19 22:47, Andriy Rysin pisze: >> I am thinking to add rules to Ukrainian that would check if related >> words agree on case/gender etc. There are several primary cases for >> this: >> 1) having adjective and noun have the same case, gender etc >> 2) having noun's/adjective's gender or plural form to match that of the verb >> 3) having adjective and/or noun to be in a right case if it follows >> the preposition or a verb that requires some particular case >> >> I was thinking about 3) for a bit and it looks like it'll be too hard >> to implement this in grammar.xml: we have 7 cases in Ukrainian and >> some prepositions may allow several cases for following >> nouns/adjectives and some nouns don't change (currently I just mark >> them as such in the dictionary instead of exploding same word 7 times >> which may be more correct way to go). >> >> So I was going to take a shot at doing this in Jave (and I guess along >> the way I'll see if it makes sense follow similar pattern for 1 and >> 2). But before I started I wanted to double check that there's no >> good/common/existing way of doing things like that. > > You might look at unification. Basically, we are able to find > non-agreeing words very easily if they are POS-tagged appropriately. > > http://wiki.languagetool.org/using-unification > > Note however that in Polish, that would generate zillions of false > alarms as there might be several noun phrases in different grammatical > cases, and such a check would also match all boundaries of phrases (last > word of the first phrase and the first word of the next phrase). You > should also mark up the phrases ("chunks") first. I want to add marking > of chunks in the disambiguator but I don't have time currently to do > that. But it is definitely possible to mark chunks in Polish using > unification (with a few additions). > > Speaking of which, is there a real POS dictionary for Ukrainian in LT? I > thought you have only a lemmatiser. We might integrate UGTagger after all? > > Regards, > Marcin > > ------------------------------------------------------------------------------ > Shape the Mobile Experience: Free Subscription > Software experts and developers: Be at the forefront of tech innovation. > Intel(R) Software Adrenaline delivers strategic insight and game-changing > conversations that shape the rapidly evolving mobile landscape. Sign up now. > http://pubads.g.doubleclick.net/gampad/clk?id=63431311&iu=/4140/ostg.clktrk > _______________________________________________ > Languagetool-devel mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/languagetool-devel ------------------------------------------------------------------------------ Shape the Mobile Experience: Free Subscription Software experts and developers: Be at the forefront of tech innovation. Intel(R) Software Adrenaline delivers strategic insight and game-changing conversations that shape the rapidly evolving mobile landscape. Sign up now. http://pubads.g.doubleclick.net/gampad/clk?id=63431311&iu=/4140/ostg.clktrk _______________________________________________ Languagetool-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/languagetool-devel
