When there is a word confusion, we need the determination if the other 
words in the sentence (better: paragraph) indicate the other meaning 
sigificantly.

That will need a LT XML rule for every potential confusion, or one 
special Java rule that takes care of it.

So word1,word2,contextwords1,contextwords2 is all you need to process 
it, imho. There is no classification really necessary. It is more 
fundamental, but also more difficult.

These kind of entries might be easy to generate. Actually, I am 
collecting word-word combinations right now, and will be able to 
determine if the relations are significant. It takes a lot of computer 
time, but there is time.

Ruud

On 16-05-12 22:21, Marcin Miłkowski wrote:
> W dniu 2012-05-16 20:10, Jan Schreiber pisze:
>
>> BTW, it should be possible to store at least those entities outside the
>> file itself, but I don't know how. --Jan
> Well, I had a look and it seems that you are using some of the entities
> to define fairly long regular expressions (disjunctions). This slows
> down LT quite substantially (I profiled some rules in the Polish XML
> file). I had such long lists for Polish reflexive verbs, and I decided
> to add a new POS tag for that, and it made processing much faster.
>
> But my solution was a hack that can be made more general. We do not need
> to be include such new classifications in the normal tagger file: as our
> taggers can be used instead of all such disjunctive regular expressions,
> you could also simply include lists of adjectives referring to languages
> (sprachadj) in a dedicated semantic tagger file. This might be read by a
> manual tagger or a morfologik-stemming tagger (which will definitely
> work faster). We could, in principle, add a new attribute - a "semantic
> classification tag" - to XML that would be differentiated from a normal
> POS tag, and use our existing tagger infrastructure to support this new
> feature.
>
> I planned to use some parts of the Polish Wordnet for some rules, and
> only recently it was made available under a BSD-like license.
> Classifying some of the words semantically might be really useful for
> some rules.
>
> Regards
> Marcin
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to