Jaume, and everybody,
I started to implement the features we need in MorfologikSpeller (in
morfologik repository on github).
First of all, there will be a bunch of new configurable properties in
the .info file; in short, we will have:
* "fsa.dict.speller.ignore-numbers" for ignoring numbers;
* "fsa.dict.speller.locale" for specifying the Locale (in BCP 47 style),
so that case conversions are fine;
* "fsa.dict.speller.ignore-punctuation" for ignoring punctuation;
* "fsa.dict.speller.ignore-diacritics" for ignoring diacritics;
* "fsa.dict.speller.convert-case" for treating lowercase characters as
equivalent in diacritics conversion and in looking for suggestions for
the words that start with uppercase (think of sentence starts);
* "fsa.dict.speller.runon-words" for turning off and on the runon words
feature.
This way, no code will need to be touched to change the dictionary behavior.
The dictionary now supports diacritics conversions. It works fairly well
for Polish but more tests will be needed.
The remaining features to be implemented now:
* multiple characters substitution [REP feature in hunspell]
* equivalent chars [MAP feature in hunspell]
I basically need a human-readable way to represent the substitution maps
as Java property. In short, there are two ways:
(1) Create a very long string with two delimiters, for example | and =
(actually \=, as equals needs to be escaped), i.e.:
fsa.dict.speller.replacements=L·L\=L|abc\=xyz
fsa.dict.speller.map=L\=Ł|l\=ł
If you can find a better character than '=' here, it could work nicely.
The escape char makes it barely readable... (':' also needs escaping,
unfortunately).
(2) Create several replacements with systematically changing names, for
example:
fsa.dict.speller.replacement-1=L·L\=L
fsa.dict.speller.replacement-2=abc\=xyz
Any preferences? Different ideas?
Best regards,
Marcin
W dniu 2013-04-23 21:53, Jaume Ortolà i Font pisze:
> Marcin,
>
> I attach again the Speller.java file with some minor changes. This
> problem is solved now:
>
> "There is a problem to be solved. The L -> L·L substitution adds a
> distance of 0, but the L·L-> L substitution adds 1. It should be always 0."
>
> Best,
> Jaume
>
>
>
>
> ------------------------------------------------------------------------------
> Try New Relic Now & We'll Send You this Cool Shirt
> New Relic is the only SaaS-based application performance monitoring service
> that delivers powerful full stack analytics. Optimize and monitor your
> browser, app, & servers with just a few lines of code. Try New Relic
> and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
>
>
>
> _______________________________________________
> Languagetool-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>
------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Languagetool-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/languagetool-devel