> W dniu 2014-09-03 12:30, R.J. Baars pisze:
>> Marcin,
>>
>> For English, there are .info files in /resource/ as well as in
>> /resource/hunspell.
>> First seems to be for the tagging dict, second for the speller.
> Ah, of course, there should be one .info file per one .dict file. I
> thought you were asking about one dictionary file.
>
>>
>> (I would prefer spell-checker for directory name.)
>>
>> The content of the info file for Dutch should probably be:
>> fsa.dict.speller.ignore-numbers=false
>> fsa.dict.speller.ignore-all-uppercase=false
>> fsa.dict.speller.ignore-camel-case=true
>> fsa.dict.speller.ignore-punctuation=false
> Note: if you don't have all punctuation in your dictionary, this will
> make the speller complain on all commas, colons, hyphens etc.
>
>> fsa.dict.input-conversion=ij ij, IJ IJ

>
> You need to use normal Unicode here or Java escaping, not HTML escaping.

This was cause by email conversion ;-)

>
>> fsa.dict.output-conversion=ij ij, IJ IJ
> Do you have such characters in the dictionary file? If not, then you
> don't need the output conversion.

I need to make sure that a word like IJmuiden (place) is never accepted as
Ijmuiden. In Hunspell, I converted every incoming ij into the ligature,
and back going out, to make that possible.
>
>> fsa.dict.speller.runon-words=false
>> fsa.dict.speller.locale=nl_NL
>> fsa.dict.speller.convert-case=false
>> fsa.dict.speller.ignore-diacritics=true
>> fsa.dict.speller.replacement-pairs=y ij, ei ij
>> fsa.dict.speller.equivalent-chars=
>> fsa.dict.frequency-included=true
>> fsa.dict.encoding=utf-8
>> fsa.dict.separator=
>> fsa.dict.author=R. Baars;
>>
>> I am not sure about separator , equivalent chars and the locale.
> Separator is just used for internal management (usually it's a plus
> character). Doesn't really matter unless you want to use "+" as an entry
> (and you would have to if you have "ignore-punctuation" set to false).
>
>> I don quite get the difference between diacritics, equivalent chars and
>> replacment pairs. Diacritics seems to me to be part of equivalent and is
>> a
>> kind of automatic replacement.
> Diacritics is automatic and faster than replacement pairs. Roughly the
> same as equivalent chars.
>
>> ei ij is a replacement, á and a are taken care of by diacritics, and I
>> guess Dutch does not have equivalents ...
>>
>> Right?
> What about apostrophes? Do you want them normalized or not?
Yes I guess I would ...
>
> Regards,
> Marcin
>
>>
>>
>>
>>> W dniu 2014-09-03 10:58, R.J. Baars pisze:
>>>> To add the words frequencis, I am directed by the wiki to an address
>>>> where
>>>> there is a frequency list indeed. But only 187000 words; while I have
>>>> 1.2
>>>> million Dutch words and their frequency myself.
>>> Probably the probabilities of their occurrence is quite low. I tried
>>> replacing that list with a bigger one for Polish and my results indeed
>>> made the dictionary file bigger but nothing else changed much.
>>>
>>>> The frequency is just a number; what is expected there? I this number
>>>> a
>>>> plain ratio, a occurrence count, or something else, like logarithmic?
>>>> Will I have to convert to that format, or is a plain word<tab>number
>>>> an
>>>> option too?
>>> Log scale, I believe. You might want to filter out some of the lower
>>> results, as well, as they don't really help and only make files bigger.
>>>
>>> Marcin
>>>
>>>> Ruud
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Slashdot TV.
>>>> Video for Nerds.  Stuff that matters.
>>>> http://tv.slashdot.org/
>>>> _______________________________________________
>>>> Languagetool-devel mailing list
>>>> Languagetool-devel@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>>>>
>>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Slashdot TV.
>>> Video for Nerds.  Stuff that matters.
>>> http://tv.slashdot.org/
>>> _______________________________________________
>>> Languagetool-devel mailing list
>>> Languagetool-devel@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Slashdot TV.
>> Video for Nerds.  Stuff that matters.
>> http://tv.slashdot.org/
>> _______________________________________________
>> Languagetool-devel mailing list
>> Languagetool-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>>
>>
>
>
> ------------------------------------------------------------------------------
> Slashdot TV.
> Video for Nerds.  Stuff that matters.
> http://tv.slashdot.org/
> _______________________________________________
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>



------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to