Re: [Languagetool] How to enable spellchecking?

Dominique Pellé Mon, 04 Jun 2012 12:40:11 -0700

Marcin Miłkowski <list-addr...@wp.pl> wrote:

> I replaced the 1.1.12 version with a current one, and it works for me on
> the command line for 32-bit JVM. I will check in other libraries, please
> see if it helps.


Breton spelling checking:

* evel-se is now no longer marked as mistake (good)
* Gwelloc'h is still marked as spurious mistake yet
  it's good. I think it's because the tokenizer hack
  change the apostrophe U+0027 into U+2019.

But LanguageTool outputs some strange numbers now!?

$ echo "Gwelloc'h evel-se." | java -jar dist/LanguageTool.jar -l br -v
Expected text language: Breton
Working on STDIN...
464 rules activated for language Breton
<S> Gwelloc’h[gwell/J cmp]  evel-se[evel-se/A].[</S>]<P/>
*71 119 101 108 108 111 99 -30 -128 -103 104 0
*1.) Line 1, column 1, Rule ID: HUNSPELL_RULE
Message: Possible spelling mistake found
Suggestion: Gwelloc'h
Gwelloc'h evel-se.
^^^^^^^^^
Time: 781ms for 1 sentences (1.3 sentences/sec)

* Improvement since yesterday: checking above
  breton 2-word sentence use to take 1.7 sec, now
  it takes 0.781 sec.

French spelling checking:

* "Il sera" and "Ils ont" are now no longer marked as
  mistake (good)
* "Jusqu'à" is still marked as spelling error yet it's
  good. That must be because of the tokenization
  What is in the dictionary is "jusqu'à"  but LT
  tokenizes and checks separately: jusqu  '  à

I've also run my script to find the start up time in
seconds for all languages when checking a 2-words
sentence "foo bar".

With hunspell (default) at svn r7231:

lang | #rules | startup time in sec (3 samples)
-----+--------+--------------------------------
 ast |     61 |  0.66 0.48 0.48
  br |    458 |  1.22 1.22 1.25
  ca |    397 |  0.90 0.95 0.90
  cs |      1 |  0.11 0.11 0.11
  zh |    328 |  2.26 2.26 2.26
  da |     22 |  0.91 0.90 0.91
  nl |    336 |  0.95 0.95 0.95
  en |    797 |  0.72 0.69 0.69
  eo |    274 |  0.73 0.76 0.73
  fr |   2052 |  0.80 0.80 0.81
  gl |    157 |  0.86 0.86 0.85
  be |      7 |  1.31 1.28 1.29
  de |   1406 |  2.31 2.37 2.31
  is |     39 |  0.93 0.87 0.88
  it |    116 |  0.53 0.51 0.53
  km |     24 |  0.60 0.57 0.56
  lt |      6 |  0.48 0.47 0.45
  ml |     23 |  0.88 0.85 0.84
  pl |   1029 |  1.39 1.34 1.44
  ro |    459 |  1.03 1.02 1.01
  ru |    153 |  1.36 1.36 1.43
  sk |     58 |  1.30 1.36 1.18
  sl |     86 |  0.87 0.95 0.86
  es |     70 |  0.77 0.76 0.75
  sv |     26 |  0.11 0.11 0.11
  tl |     44 |  0.47 0.39 0.39
  uk |     25 |  1.65 1.60 1.60


Same test using -d HUNSPELL_RULE to
disable hunspell (it's about the same time!?)

lang | #rules | startup time in sec (3 samples)
-----+--------+--------------------------------
 ast |     61 |  0.49 0.48 0.48
  br |    458 |  1.22 1.29 1.24
  ca |    397 |  0.90 0.94 0.91
  cs |      1 |  0.11 0.11 0.12
  zh |    328 |  2.31 2.26 2.26
  da |     22 |  0.93 0.91 0.91
  nl |    336 |  1.03 0.98 1.01
  en |    797 |  0.83 0.76 0.74
  eo |    274 |  0.78 0.79 0.82
  fr |   2052 |  0.96 0.86 1.00
  gl |    157 |  0.97 1.05 1.09
  be |      7 |  1.33 1.52 1.53
  de |   1406 |  2.86 2.35 2.48
  is |     39 |  0.99 1.02 0.94
  it |    116 |  0.51 0.56 0.81
  km |     24 |  0.75 0.56 0.61
  lt |      6 |  0.54 0.55 0.50
  ml |     23 |  1.05 0.86 0.85
  pl |   1029 |  1.69 1.59 1.54
  ro |    459 |  1.29 1.13 1.21
  ru |    153 |  1.64 1.69 1.85
  sk |     58 |  1.29 1.51 1.54
  sl |     86 |  1.17 1.24 0.94
  es |     70 |  1.07 0.86 0.82
  sv |     26 |  0.11 0.14 0.12
  tl |     44 |  0.41 0.51 0.52
  uk |     25 |  1.68 1.89 2.03


The startup time was significantly faster before
the Hunspell changes.  These are the startup time
before hunspel (svn r6963):

lang | #rules | startup time in sec (3 samples)
-----+--------+--------------------------------
 ast |     61 |  0.30 0.27 0.26
  br |    437 |  0.49 0.49 0.48
  ca |    434 |  0.65 0.66 0.64
  cs |      1 |  0.11 0.10 0.10
  zh |    328 |  2.35 2.26 2.26
  da |     22 |  0.52 0.51 0.51
  nl |    336 |  0.61 0.59 0.60
  en |    787 |  0.72 0.68 0.68
  eo |    269 |  0.56 0.57 0.56
  fr |   2040 |  0.53 0.52 0.53
  gl |    157 |  0.67 0.66 0.67
  be |      7 |  0.48 0.47 0.47
  de |   1374 |  2.11 2.03 2.02
  is |     39 |  0.57 0.53 0.50
  it |    116 |  0.32 0.28 0.28
  km |     24 |  0.59 0.55 0.56
  lt |      6 |  0.20 0.21 0.20
  ml |     23 |  0.52 0.50 0.49
  pl |   1029 |  0.82 0.81 0.81
  ro |    459 |  0.67 0.68 0.66
  ru |    153 |  0.63 0.61 0.61
  sk |     58 |  0.65 0.64 0.63
  sl |     86 |  0.52 0.50 0.50
  es |     70 |  0.55 0.54 0.54
  sv |     26 |  0.11 0.11 0.10
  tl |     44 |  0.25 0.25 0.25
  uk |     25 |  1.22 1.22 1.23

Any reason for LT to become slower now even
when disabling Hunspell with -d HUNSPELL_RULE?

I assume that it's the Hunspell change that slows
down between r6963 (fast) and r7231 (slow) but
I could be wrong since there are other changes.

If I had more time, I could make a graph showing
LT's startup time for some languages vs revision in SVN.

My script to measure startup time is available here:

http://dominique.pelle.free.fr/startup-time-lt.sh

Regard
-- Dominique

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/

_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Re: [Languagetool] How to enable spellchecking?

Reply via email to