Ok, I've pushed a change to allow per-language set of characters to be
ignored in tokens (e.g. Ukrainian adds an accent U+0301 to the soft
hypen). Adding a reading with null tag seems to have affected correct
position markup so I've adjusted my rules to take that to account.

Please try it and let me know how it works for you,
Thanks
Andriy

P.S. One thing I could not figure out (yet) is correct markup for
tokens with ignored characters in xml rules, see
languagetool-language-modules/uk/src/main/resources/org/languagetool/rules/uk/grammar-spelling.xml:93


2015-01-20 11:55 GMT-05:00 Andriy Rysin <ary...@gmail.com>:
> Ok, so I have a token agreement rule which checks if any of the token
> readings have the required form. If it found good, if it didn't it'll
> show error, but if it finds a reading with null tag it assumes we
> don't know enough and will skip the check for this token. It seems for
> untagged words we use null tag so this works when reading with null
> POSTAG is the only one. If we're saying we can have additional
> readings with null which are "information-only" I can probably adjust
> the logic I have.
>
> We could also tag the reading with ignored chars inside the same way
> the "cleaned" token is but I am afraid the "dirty" token reading will
> affect suggestions etc in the way we don't want.
>
> Andriy
>
> 2015-01-20 9:58 GMT-05:00 Daniel Naber <daniel.na...@languagetool.org>:
>> On 2015-01-20 14:29, Andriy Rysin wrote:
>>
>>> So in JLanguageToolTest.testAnalyzedSentence() (line 133) the expected
>>> reading for token with soft hyphen excpects test­ed/null, but I don't
>>> really understand this logic.
>>
>> I think the null is probably not the point, the code in
>> JLanguageTool.getRawAnalyzedSentence() seems to re-add the token with
>> the soft hyphen again. It probably simply uses null as a POS tag because
>> I (or whoever added it) though it shouldn't hurt. So maybe just the
>> token needs to be set, not another reading (adding the null reading may
>> be just a side effect).
>>
>> Regards
>>   Daniel
>>
>>
>> ------------------------------------------------------------------------------
>> New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
>> GigeNET is offering a free month of service with a new server in Ashburn.
>> Choose from 2 high performing configs, both with 100TB of bandwidth.
>> Higher redundancy.Lower latency.Increased capacity.Completely compliant.
>> http://p.sf.net/sfu/gigenet
>> _______________________________________________
>> Languagetool-devel mailing list
>> Languagetool-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/languagetool-devel

------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to