Re: Please help test Chrome extension beta
Great! But I get error during install this extension: "Cannot unzip" I install this extension after unpack it and go to developer mode in Google Chrome (Win 8.1 ) In this mode I see varning: "Unrecognized manifest key 'applications'." After that extension work fine. >Пятница, 6 мая 2016, 16:19 +03:00 от Daniel Naber >: > >Hi, > >please help test a new version of the Chrome extension: >http://forum.languagetool.org/t/please-help-test-chrome-extension-beta/847 > >Please also do this if Chrome is not your default browser, because soon >Firefox will be compatible with Chrome extensions and then this new >version will become the LT add-on for Firefox, too (Firefox 48 will >probably be the first version to be compatible enough). > >Regards > Daniel > > >-- >Find and fix application performance issues faster with Applications Manager >Applications Manager provides deep performance insights into multiple tiers of >your business applications. It resolves application problems quickly and >reduces your MTTR. Get your free trial! >https://ad.doubleclick.net/ddm/clk/302982198;130105516;z >___ >Languagetool-devel mailing list >Languagetool-devel@lists.sourceforge.net >https://lists.sourceforge.net/lists/listinfo/languagetool-devel -- Yakov Reztsov -- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel
Re: Please help test Chrome extension beta
I just tried this extension in Firefox 48.0a2: I had to turn of signature checking to install and there's no Ukrainian translation but otherwise it works well. One thing I noted (and looks like it's the same behavior as in Firefox extension) - if there's a spelling error and grammar error on the same word both are shown. On the webpage checking (e.g. on http://langaugetool.org) only grammar one is shown (which I like). The other thing is usability: currently the LT popup window is transient so if I were to fix the errors reported in the text field I would click on it and the LT window goes away. I would copy the errors into some text editor but the text there is not selectable. Regards, Andriy 2016-05-06 14:31 GMT-04:00 Yakov Reztsov: > Great! > But I get error during install this extension: "Cannot unzip" > I install this extension after unpack it and go to developer mode in Google > Chrome (Win 8.1 ) > > In this mode I see varning: "Unrecognized manifest key 'applications'." > After that extension work fine. > > > Пятница, 6 мая 2016, 16:19 +03:00 от Daniel Naber > : > > > Hi, > > please help test a new version of the Chrome extension: > http://forum.languagetool.org/t/please-help-test-chrome-extension-beta/847 > > Please also do this if Chrome is not your default browser, because soon > Firefox will be compatible with Chrome extensions and then this new > version will become the LT add-on for Firefox, too (Firefox 48 will > probably be the first version to be compatible enough). > > Regards > Daniel > > > -- > Find and fix application performance issues faster with Applications Manager > Applications Manager provides deep performance insights into multiple tiers > of > your business applications. It resolves application problems quickly and > reduces your MTTR. Get your free trial! > https://ad.doubleclick.net/ddm/clk/302982198;130105516;z > ___ > Languagetool-devel mailing list > Languagetool-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/languagetool-devel > > > > -- > > Yakov Reztsov > > -- > Find and fix application performance issues faster with Applications Manager > Applications Manager provides deep performance insights into multiple tiers > of > your business applications. It resolves application problems quickly and > reduces your MTTR. Get your free trial! > https://ad.doubleclick.net/ddm/clk/302982198;130105516;z > ___ > Languagetool-devel mailing list > Languagetool-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/languagetool-devel > -- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel
Re: ignoring certain tokens in rules
Jaume Ortolà i Fontwrote: > Hi, > > I think Marcin talked about this idea some time ago. > > Sometimes tokens like quotations (or other characters) should be ignored in > some rules. That is, the sentence should be checked as if this token is not > present. Any idea about how could it be implemented? > > Alternatively, tokens like this one should be added to the the patterns: > > [“‘”«"'] > > I would need to modify a few dozen rules. But perhaps this is the best > solution: it gives more control about the rule, the suggestions, possible > false alarms, and so on. what do you think? > > Regards, > Jaume Ortolà I have not looked in details at what the French grammar checker Grammalecte [1] does, but I think that it checks input text in multiple passes. In some passes, pre-processor rules eliminate pieces of texts. For example, the pre-processor can eliminate "useless" punctuation or locutions made of multiple words. For example, I see in Grammalecte pre-processor rules such as: [«»“”„"`¹²³⁴⁵⁶⁷⁸⁹⁰]+ -> * This rule eliminates a few "useless" characters. [(]\w+[)] -> * This rule eliminates text is parenthesis such as (foo bar). The important thing to keep in mind is that the sentence is checked multiple times. For example: * first pass checks the text as-is. * second pass checks the text again, after applying pre-processor rules. It seems like a good idea. Regards Dominique [1] http://www.dicollecte.org/grammalecte/ -- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel
Re: probability theory code review?
Hi Sadly, my math is weak but I will give it a try. Just make sure to re-check :) On Thu, Aug 06, 2015 at 11:29:05AM +0200, Daniel Naber wrote: > we're using a bit probability theory to calculate ngram probabilities. > This way we can decide which word of a homophone pair like there/their > is (probably) correct. Is anybody here familiar with probability theory > and could review that code? The main part is here: > > https://github.com/languagetool-org/languagetool/blob/master/languagetool-core/src/main/java/org/languagetool/languagemodel/BaseLanguageModel.java#L41 (I updated the link since this mail is late...) Below is the relevant function in its full form. > + Probability getPseudoProbability(List context) { > + int maxCoverage = 0; > + int coverage = 0; > + long firstWordCount = lm.getCount(context.get(0)); > + maxCoverage++; Off topic: This variable could be initalized to 1 directly on the first line of the function. > + if (firstWordCount > 0) { > + coverage++; > + } > + // chain rule: The chain rule is P(A,B,C,...) = P(A) * P(B|A) * P(C|A, B) * ... So the line below would be P(A) > + double p = (double) (firstWordCount + 1) / (totalTokenCount + 1); which looks okay but (assuming you are going for Laplace-Add-one smoothing) you would have to not add + 1 to "totalTokenCount" but the vocabulary size for the n-gram model (== all unique n-grams which for unigrams would mean all unique "syntactic words"). Another smoothing approach *may* work better. > + debug("P for %s: %.20f (%d)\n", context.get(0), p, firstWordCount); > + for (int i = 2; i <= context.size(); i++) { > + List subList = context.subList(0, i); > + long phraseCount = lm.getCount(subList); > + double thisP = (double) (phraseCount + 1) / (firstWordCount + 1); This would be the place the conditional probabilities within the chain are calculated. A conditional probability can be calculated as follows. P(B|A) = P(A,B)/P(A) Using the conditional probability of token "is" given token "this" as an example it would look like this. P("is"|"this") = P("this","is")/P("this") where P("this","is") = C("this is")/C(all 2-grams) ( C() denotes the count of the argument ) so I would have expected something like + double thisP = (double) ((Ngramcount + 1) / (countofallNgrams + countofalluniqueNgrams)) / (countofN-1grams + countofalluniqueN-1grams); Please note that one would have to adjust the n-gram-dependent counts for different ns as n gets larger. References: * http://web.mit.edu/6.863/www/fall2012/readings/ngrampages.pdf * "Foundations of Statistical Natural Language Processing" by Christopher D. Manning, Hinrich Schütze: 42f, 197, 202 -- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel
Re: ignoring certain tokens in rules
Hi, In fact, the problem is a bit more complicated than I expected because the disambiguation rules also need to ignore the tokens with quotation marks. So it would be necessary to add a lot of everywhere and it would probably be unmanageable. A more general solution: - In AnalyzedSentece remove tokens containing quotation marks only in getTokensWithoutWhitespace(). - Add two fields to AnalyzedTokenReadings: leftQuotationMark, rightQuotationMark, which contain the characters adjacent to the word (none, one side or both sides). - Run everything as usually with the new getTokensWithoutWhitespace (disambiguation, grammar rules, etc.). - Retrieve leftQuotationMark, rightQuotationMark when necessary, for example in suggestions (i.e.). Possible difficulties: - GenericUnpairedBracketsRule must be modified accordingly. - Perhaps some grammar and disambiguation rules should know about the quotation marks and new attributes could be necessary (similar to spacebefore="yes/no"). - Whitespaces in French. - Other unexpected troubles. Do you think this is a good approach? I can try to implement it, but I am not really sure if it is worthwhile because the problems it solves are relatively rare. Regards, Jaume Ortolà 2016-05-05 16:22 GMT+02:00 Jaume Ortolà i Font: > Hi, > > I think Marcin talked about this idea some time ago. > > Sometimes tokens like quotations (or other characters) should be ignored > in some rules. That is, the sentence should be checked as if this token is > not present. Any idea about how could it be implemented? > > Alternatively, tokens like this one should be added to the the patterns: > > [“‘”«"'] > > I would need to modify a few dozen rules. But perhaps this is the best > solution: it gives more control about the rule, the suggestions, possible > false alarms, and so on. what do you think? > > Regards, > Jaume Ortolà > > > -- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel
Please help test Chrome extension beta
Hi, please help test a new version of the Chrome extension: http://forum.languagetool.org/t/please-help-test-chrome-extension-beta/847 Please also do this if Chrome is not your default browser, because soon Firefox will be compatible with Chrome extensions and then this new version will become the LT add-on for Firefox, too (Firefox 48 will probably be the first version to be compatible enough). Regards Daniel -- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel