2013/7/15 Marcin Miłkowski <[email protected]>:
> Hi Jaume,
>
> W dniu 2013-07-15 21:16, Jaume Ortolà i Font pisze:
>> Hi, Marcin.
>>
>> I have tested the current code (1.8.0-SNAPSHOT) and everything is OK,
>> all the changes are there. Thank you.
>
> Great. We'll release 1.7.1, this is just a minor bug fix.
>
> BTW, when you see something you want to fix, just make a fork on github
> to fix it, then file an issue, and then make a pull request associated
> with that issue. That way, it will be much easier to develop the library
> with your changes.

I'll try to do it.

> Also, if you'll find time to use a proper way of removing duplicates
> (now we lose information from CandidateData that might be significant
> for something - I know this is me being fussy, this is quite clean).

There are different ways to do it:
- We could test for duplicates in addCandidate()...
- "candidates" could be a Set, but then it needs to be converted to a
List to be sorted...

If you want to keep the distance information outside Speller.java,
that's a different a matter.


The next step for improving the suggestions would be to use a list of
frequent words. I'm thinking of just a list of manually selected words
or at most a few thousand words from a frequency dictionary.

Regards,
Jaume


> Regards,
> Marcin
>
>>
>> Now we need a release with the changes, and we'll be able to adapt the tests.
>>
>> Regards,
>> Jaume
>> Salutacions,
>> Jaume Ortolà
>> www.riuraueditors.cat
>>
>>
>>
>> 2013/7/15 Marcin Miłkowski <[email protected]>:
>>> W dniu 2013-07-15 12:41, Jaume Ortolà i Font pisze:
>>>> Thanks, Marcin.
>>>>
>>>> Some remarks. The improvements I sent to the list 15 days ago have not
>>>> been added, and moreover I have found more bugs.
>>> I'm really sorry but there are 200 mails from the mailing list over the
>>> last two weeks and I have been away from my e-mail. Could you please add
>>> your changes as issues on github for morfologik-stemming? This way it
>>> would make it much easier for us to track these things.
>>>
>>>> I attach the code I'm using now and explain briefly the reasons for the 
>>>> changes.
>>>>
>>>> - In the getAllReplacements method we need to make sure that the
>>>> replacements are done from left to right. We must complete the
>>>> for-loop of the replacement pairs, choose the first possible
>>>> replacement (form left to right) and then start the two new branches
>>>> (with and without replacement). Otherwise, some replacements are not
>>>> done.
>>> OK, this sounds OK. I integrated your changes.
>>>
>>>> - If there is "ss" as a key in the replacement pairs, and somebody
>>>> uses a long string of s ("ssssssssss...") as input text, this can
>>>> cause the method to consume all the memory, as the algorithm is
>>>> exponential (2^(number of replacements)). This happened to us in an
>>>> online server, and the LT server crashed. The depth of the recursive
>>>> algorithm should be limited to 4 o 5 levels at most.
>>> Is that in getAllReplacements()?
>>>
>>>> - It is possible that different "words to check" give the same
>>>> suggestion. So at some point we need to remove duplicates. I do this
>>>> at the end of findReplacements().
>>> You are right. We could probably write the same code in a slightly more
>>> elegant way, without converting this to a LinkedHashSet but simply by
>>> adding to a set when iterating the list.
>>>
>>>> - The conditions around line 238 (current github version 1.7) are not
>>>> correct. The first isInDictionary makes the lower case conversion
>>>> useless:
>>>>
>>>>                       if (isInDictionary(wordChecked)
>>>>                               && dictionaryMetadata.isConvertingCase()
>>>>                               && isMixedCase(wordChecked)
>>>>                               &&
>>>> isInDictionary(wordChecked.toLowerCase(dictionaryMetadata.getLocale())))
>>>>
>>>> I think they should be something like:
>>>>
>>>>             if (isInDictionary(wordChecked)
>>>>                 || (dictionaryMetadata.convertCase
>>>>                 && isMixedCase(wordChecked)
>>>>                 && isInDictionary(wordChecked
>>>>                     .toLowerCase(dictionaryMetadata.dictionaryLocale))))
>>> Fixed!
>>>
>>> I tried to add your fixes but your code is now quite far away from ours,
>>> so diff does not give any meaningful output. Please review the code on
>>> github, and if needed, file an issue over changes that need to be done.
>>>
>>> Regards,
>>> Marcin
>>>
>>>> Regards,
>>>> Jaume Ortolà
>>>> Salutacions,
>>>> Jaume Ortolà
>>>> www.riuraueditors.cat
>>>>
>>>>
>>>>
>>>> 2013/7/15 Marcin Miłkowski <[email protected]>:
>>>>> W dniu 2013-07-15 10:56, Marcin Miłkowski pisze:
>>>>>> Hi,
>>>>>>
>>>>>> Dawid just released morfologik 1.7 on Maven. So we can actually go on
>>>>>> and include a newer version in LT.
>>>>>>
>>>>>> The new version still does not support compounding but it has all the
>>>>>> features required for getting better diacritic suggestions.
>>>>> Here's the documentation:
>>>>>
>>>>> http://wiki.languagetool.org/hunspell-support#toc5
>>>>>
>>>>> Best,
>>>>> Marcin
>>>>>
>>>>>
>>>>>> Best,
>>>>>> Marcin
>>>>>>
>>>>>> W dniu 2013-07-02 08:59, Marcin Miłkowski pisze:
>>>>>>> W dniu 2013-07-02 01:11, Jaume Ortolà i Font pisze:
>>>>>>>> Hi Marcin,
>>>>>>>>
>>>>>>>> I have been using the still unreleased code of morfologik-stemming and 
>>>>>>>> I
>>>>>>>> have made improvements to Speller.java for some previously unforseen
>>>>>>>> cases. See the attachement.
>>>>>>>>
>>>>>>>> In order to complete the development, and test & debug with all
>>>>>>>> languages, perhaps we could include temporarily the morfologik module
>>>>>>>> inside LanguageTool. This will make thinks easier. What do yo think?
>>>>>>> No. I should make a release, forking morfologik makes no sense to me.
>>>>>>>
>>>>>>> The only thing that stops me is the lack of time to work on compounds.
>>>>>>>
>>>>>>> Best,
>>>>>>> Marcin
>>>>>>>
>>>>>>> ------------------------------------------------------------------------------
>>>>>>>
>>>>>>> This SF.net email is sponsored by Windows:
>>>>>>>
>>>>>>> Build for Windows Store.
>>>>>>>
>>>>>>> http://p.sf.net/sfu/windows-dev2dev
>>>>>>> _______________________________________________
>>>>>>> Languagetool-devel mailing list
>>>>>>> [email protected]
>>>>>>> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>>>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> See everything from the browser to the database with AppDynamics
>>>>> Get end-to-end visibility with application monitoring from AppDynamics
>>>>> Isolate bottlenecks and diagnose root cause in seconds.
>>>>> Start your free trial of AppDynamics Pro today!
>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>>>>> _______________________________________________
>>>>> Languagetool-devel mailing list
>>>>> [email protected]
>>>>> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> See everything from the browser to the database with AppDynamics
>>>>> Get end-to-end visibility with application monitoring from AppDynamics
>>>>> Isolate bottlenecks and diagnose root cause in seconds.
>>>>> Start your free trial of AppDynamics Pro today!
>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Languagetool-devel mailing list
>>>>> [email protected]
>>>>> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>>>
>>> ------------------------------------------------------------------------------
>>> See everything from the browser to the database with AppDynamics
>>> Get end-to-end visibility with application monitoring from AppDynamics
>>> Isolate bottlenecks and diagnose root cause in seconds.
>>> Start your free trial of AppDynamics Pro today!
>>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> Languagetool-devel mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>> ------------------------------------------------------------------------------
>> See everything from the browser to the database with AppDynamics
>> Get end-to-end visibility with application monitoring from AppDynamics
>> Isolate bottlenecks and diagnose root cause in seconds.
>> Start your free trial of AppDynamics Pro today!
>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Languagetool-devel mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>
>
> ------------------------------------------------------------------------------
> See everything from the browser to the database with AppDynamics
> Get end-to-end visibility with application monitoring from AppDynamics
> Isolate bottlenecks and diagnose root cause in seconds.
> Start your free trial of AppDynamics Pro today!
> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
> _______________________________________________
> Languagetool-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Languagetool-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to