2013/7/15 Marcin Miłkowski <[email protected]>: > Hi Jaume, > > W dniu 2013-07-15 21:16, Jaume Ortolà i Font pisze: >> Hi, Marcin. >> >> I have tested the current code (1.8.0-SNAPSHOT) and everything is OK, >> all the changes are there. Thank you. > > Great. We'll release 1.7.1, this is just a minor bug fix. > > BTW, when you see something you want to fix, just make a fork on github > to fix it, then file an issue, and then make a pull request associated > with that issue. That way, it will be much easier to develop the library > with your changes.
I'll try to do it. > Also, if you'll find time to use a proper way of removing duplicates > (now we lose information from CandidateData that might be significant > for something - I know this is me being fussy, this is quite clean). There are different ways to do it: - We could test for duplicates in addCandidate()... - "candidates" could be a Set, but then it needs to be converted to a List to be sorted... If you want to keep the distance information outside Speller.java, that's a different a matter. The next step for improving the suggestions would be to use a list of frequent words. I'm thinking of just a list of manually selected words or at most a few thousand words from a frequency dictionary. Regards, Jaume > Regards, > Marcin > >> >> Now we need a release with the changes, and we'll be able to adapt the tests. >> >> Regards, >> Jaume >> Salutacions, >> Jaume Ortolà >> www.riuraueditors.cat >> >> >> >> 2013/7/15 Marcin Miłkowski <[email protected]>: >>> W dniu 2013-07-15 12:41, Jaume Ortolà i Font pisze: >>>> Thanks, Marcin. >>>> >>>> Some remarks. The improvements I sent to the list 15 days ago have not >>>> been added, and moreover I have found more bugs. >>> I'm really sorry but there are 200 mails from the mailing list over the >>> last two weeks and I have been away from my e-mail. Could you please add >>> your changes as issues on github for morfologik-stemming? This way it >>> would make it much easier for us to track these things. >>> >>>> I attach the code I'm using now and explain briefly the reasons for the >>>> changes. >>>> >>>> - In the getAllReplacements method we need to make sure that the >>>> replacements are done from left to right. We must complete the >>>> for-loop of the replacement pairs, choose the first possible >>>> replacement (form left to right) and then start the two new branches >>>> (with and without replacement). Otherwise, some replacements are not >>>> done. >>> OK, this sounds OK. I integrated your changes. >>> >>>> - If there is "ss" as a key in the replacement pairs, and somebody >>>> uses a long string of s ("ssssssssss...") as input text, this can >>>> cause the method to consume all the memory, as the algorithm is >>>> exponential (2^(number of replacements)). This happened to us in an >>>> online server, and the LT server crashed. The depth of the recursive >>>> algorithm should be limited to 4 o 5 levels at most. >>> Is that in getAllReplacements()? >>> >>>> - It is possible that different "words to check" give the same >>>> suggestion. So at some point we need to remove duplicates. I do this >>>> at the end of findReplacements(). >>> You are right. We could probably write the same code in a slightly more >>> elegant way, without converting this to a LinkedHashSet but simply by >>> adding to a set when iterating the list. >>> >>>> - The conditions around line 238 (current github version 1.7) are not >>>> correct. The first isInDictionary makes the lower case conversion >>>> useless: >>>> >>>> if (isInDictionary(wordChecked) >>>> && dictionaryMetadata.isConvertingCase() >>>> && isMixedCase(wordChecked) >>>> && >>>> isInDictionary(wordChecked.toLowerCase(dictionaryMetadata.getLocale()))) >>>> >>>> I think they should be something like: >>>> >>>> if (isInDictionary(wordChecked) >>>> || (dictionaryMetadata.convertCase >>>> && isMixedCase(wordChecked) >>>> && isInDictionary(wordChecked >>>> .toLowerCase(dictionaryMetadata.dictionaryLocale)))) >>> Fixed! >>> >>> I tried to add your fixes but your code is now quite far away from ours, >>> so diff does not give any meaningful output. Please review the code on >>> github, and if needed, file an issue over changes that need to be done. >>> >>> Regards, >>> Marcin >>> >>>> Regards, >>>> Jaume Ortolà >>>> Salutacions, >>>> Jaume Ortolà >>>> www.riuraueditors.cat >>>> >>>> >>>> >>>> 2013/7/15 Marcin Miłkowski <[email protected]>: >>>>> W dniu 2013-07-15 10:56, Marcin Miłkowski pisze: >>>>>> Hi, >>>>>> >>>>>> Dawid just released morfologik 1.7 on Maven. So we can actually go on >>>>>> and include a newer version in LT. >>>>>> >>>>>> The new version still does not support compounding but it has all the >>>>>> features required for getting better diacritic suggestions. >>>>> Here's the documentation: >>>>> >>>>> http://wiki.languagetool.org/hunspell-support#toc5 >>>>> >>>>> Best, >>>>> Marcin >>>>> >>>>> >>>>>> Best, >>>>>> Marcin >>>>>> >>>>>> W dniu 2013-07-02 08:59, Marcin Miłkowski pisze: >>>>>>> W dniu 2013-07-02 01:11, Jaume Ortolà i Font pisze: >>>>>>>> Hi Marcin, >>>>>>>> >>>>>>>> I have been using the still unreleased code of morfologik-stemming and >>>>>>>> I >>>>>>>> have made improvements to Speller.java for some previously unforseen >>>>>>>> cases. See the attachement. >>>>>>>> >>>>>>>> In order to complete the development, and test & debug with all >>>>>>>> languages, perhaps we could include temporarily the morfologik module >>>>>>>> inside LanguageTool. This will make thinks easier. What do yo think? >>>>>>> No. I should make a release, forking morfologik makes no sense to me. >>>>>>> >>>>>>> The only thing that stops me is the lack of time to work on compounds. >>>>>>> >>>>>>> Best, >>>>>>> Marcin >>>>>>> >>>>>>> ------------------------------------------------------------------------------ >>>>>>> >>>>>>> This SF.net email is sponsored by Windows: >>>>>>> >>>>>>> Build for Windows Store. >>>>>>> >>>>>>> http://p.sf.net/sfu/windows-dev2dev >>>>>>> _______________________________________________ >>>>>>> Languagetool-devel mailing list >>>>>>> [email protected] >>>>>>> https://lists.sourceforge.net/lists/listinfo/languagetool-devel >>>>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> See everything from the browser to the database with AppDynamics >>>>> Get end-to-end visibility with application monitoring from AppDynamics >>>>> Isolate bottlenecks and diagnose root cause in seconds. >>>>> Start your free trial of AppDynamics Pro today! >>>>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk >>>>> _______________________________________________ >>>>> Languagetool-devel mailing list >>>>> [email protected] >>>>> https://lists.sourceforge.net/lists/listinfo/languagetool-devel >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> See everything from the browser to the database with AppDynamics >>>>> Get end-to-end visibility with application monitoring from AppDynamics >>>>> Isolate bottlenecks and diagnose root cause in seconds. >>>>> Start your free trial of AppDynamics Pro today! >>>>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk >>>>> >>>>> >>>>> _______________________________________________ >>>>> Languagetool-devel mailing list >>>>> [email protected] >>>>> https://lists.sourceforge.net/lists/listinfo/languagetool-devel >>> >>> ------------------------------------------------------------------------------ >>> See everything from the browser to the database with AppDynamics >>> Get end-to-end visibility with application monitoring from AppDynamics >>> Isolate bottlenecks and diagnose root cause in seconds. >>> Start your free trial of AppDynamics Pro today! >>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk >>> _______________________________________________ >>> Languagetool-devel mailing list >>> [email protected] >>> https://lists.sourceforge.net/lists/listinfo/languagetool-devel >> ------------------------------------------------------------------------------ >> See everything from the browser to the database with AppDynamics >> Get end-to-end visibility with application monitoring from AppDynamics >> Isolate bottlenecks and diagnose root cause in seconds. >> Start your free trial of AppDynamics Pro today! >> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk >> _______________________________________________ >> Languagetool-devel mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/languagetool-devel > > > ------------------------------------------------------------------------------ > See everything from the browser to the database with AppDynamics > Get end-to-end visibility with application monitoring from AppDynamics > Isolate bottlenecks and diagnose root cause in seconds. > Start your free trial of AppDynamics Pro today! > http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk > _______________________________________________ > Languagetool-devel mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/languagetool-devel ------------------------------------------------------------------------------ See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk _______________________________________________ Languagetool-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/languagetool-devel
