Re: equivalent and optional characters in words

Andriy Rysin Mon, 20 May 2013 20:26:30 -0700

On 04/21/2013 03:11 AM, Jaume Ortolà i Font wrote:

2013/4/21 Andriy Rysin <ary...@gmail.com <mailto:ary...@gmail.com>>


    1) I would like to treat several apostrophes equally (apostrophes are
    part of the word in Ukrainian), e.g. in dictionary and rules I
    could use
    ' (0x27) but I would like to be able to parse text that has U+2019
    (and
    potentially U+02BC) the same way, I guess I could do a simple
    replace in
    word tokenizer but I was wondering if there's a better way

This is what is done in Catalan. So far  I have found no problem.

This seems to work pretty nice for *replacing* chars, but if I also*remove* accent (U+0301) from words in word tokenizer it looks like itmesses up the error position in the sentence (at least in the webinterface). Is there a right way to remove symbols I don't care about?


Thanks
Andriy

------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may

_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Re: equivalent and optional characters in words

Reply via email to