Re: Please help test Chrome extension beta

2016-05-06 Thread Yakov Reztsov
 Great!
But I get error during install this extension: "Cannot unzip" 
I install this extension after unpack it and go to developer mode in Google 
Chrome (Win 8.1 )

In this mode I see varning: "Unrecognized manifest key 'applications'."
After that extension work fine.


>Пятница,  6 мая 2016, 16:19 +03:00 от Daniel Naber 
>:
>
>Hi,
>
>please help test a new version of the Chrome extension:
>http://forum.languagetool.org/t/please-help-test-chrome-extension-beta/847
>
>Please also do this if Chrome is not your default browser, because soon 
>Firefox will be compatible with Chrome extensions and then this new 
>version will become the LT add-on for Firefox, too (Firefox 48 will 
>probably be the first version to be compatible enough).
>
>Regards
>  Daniel
>
>
>--
>Find and fix application performance issues faster with Applications Manager
>Applications Manager provides deep performance insights into multiple tiers of
>your business applications. It resolves application problems quickly and
>reduces your MTTR. Get your free trial!
>https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
>___
>Languagetool-devel mailing list
>Languagetool-devel@lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/languagetool-devel


-- 

Yakov Reztsov
--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Please help test Chrome extension beta

2016-05-06 Thread Andriy Rysin
I just tried this extension in Firefox 48.0a2: I had to turn of
signature checking to install and there's no Ukrainian translation but
otherwise it works well.
One thing I noted (and looks like it's the same behavior as in Firefox
extension) - if there's a spelling error and grammar error on the same
word both are shown. On the webpage checking (e.g. on
http://langaugetool.org) only grammar one is shown (which I like).

The other thing is usability: currently the LT popup window is
transient so if I were to fix the errors reported in the text field I
would click on it and the LT window goes away. I would copy the errors
into some text editor but the text there is not selectable.

Regards,
Andriy

2016-05-06 14:31 GMT-04:00 Yakov Reztsov :
> Great!
> But I get error during install this extension: "Cannot unzip"
> I install this extension after unpack it and go to developer mode in Google
> Chrome (Win 8.1 )
>
> In this mode I see varning: "Unrecognized manifest key 'applications'."
> After that extension work fine.
>
>
> Пятница, 6 мая 2016, 16:19 +03:00 от Daniel Naber
> :
>
>
> Hi,
>
> please help test a new version of the Chrome extension:
> http://forum.languagetool.org/t/please-help-test-chrome-extension-beta/847
>
> Please also do this if Chrome is not your default browser, because soon
> Firefox will be compatible with Chrome extensions and then this new
> version will become the LT add-on for Firefox, too (Firefox 48 will
> probably be the first version to be compatible enough).
>
> Regards
>   Daniel
>
>
> --
> Find and fix application performance issues faster with Applications Manager
> Applications Manager provides deep performance insights into multiple tiers
> of
> your business applications. It resolves application problems quickly and
> reduces your MTTR. Get your free trial!
> https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
> ___
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>
>
>
> --
>
> Yakov Reztsov
>
> --
> Find and fix application performance issues faster with Applications Manager
> Applications Manager provides deep performance insights into multiple tiers
> of
> your business applications. It resolves application problems quickly and
> reduces your MTTR. Get your free trial!
> https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
> ___
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>

--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: ignoring certain tokens in rules

2016-05-06 Thread Dominique Pellé
Jaume Ortolà i Font  wrote:

> Hi,
>
> I think Marcin talked about this idea some time ago.
>
> Sometimes tokens like quotations (or other characters) should be ignored in
> some rules. That is, the sentence should be checked as if this token is not
> present. Any idea about how could it be implemented?
>
> Alternatively, tokens like this one should be added to the the patterns:
>
> [“‘”«"']
>
> I would need to modify a few dozen rules. But perhaps this is the best
> solution: it gives more control about the rule, the suggestions, possible
> false alarms, and so on. what do you think?
>
> Regards,
> Jaume Ortolà

I have not looked in details at what the French grammar checker
Grammalecte [1] does, but I think that it checks input text
in multiple passes. In some passes, pre-processor rules eliminate
pieces of texts. For example, the pre-processor can eliminate
"useless" punctuation or locutions made of multiple words.

For example, I see in Grammalecte pre-processor rules such as:

[«»“”„"`¹²³⁴⁵⁶⁷⁸⁹⁰]+ -> *
This rule eliminates a few "useless" characters.

[(]\w+[)] -> *
This rule eliminates text is parenthesis such as (foo bar).

The important thing to keep in mind is that the sentence is checked
multiple times. For example:
* first pass checks the text as-is.
* second pass checks the text again, after applying pre-processor rules.

It seems like a good idea.

Regards
Dominique

[1] http://www.dicollecte.org/grammalecte/

--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: probability theory code review?

2016-05-06 Thread Silvan Jegen
Hi

Sadly, my math is weak but I will give it a try. Just make sure to
re-check :)

On Thu, Aug 06, 2015 at 11:29:05AM +0200, Daniel Naber wrote:
> we're using a bit probability theory to calculate ngram probabilities. 
> This way we can decide which word of a homophone pair like there/their 
> is (probably) correct. Is anybody here familiar with probability theory 
> and could review that code? The main part is here:
> 
> https://github.com/languagetool-org/languagetool/blob/master/languagetool-core/src/main/java/org/languagetool/languagemodel/BaseLanguageModel.java#L41

(I updated the link since this mail is late...)

Below is the relevant function in its full form.

> + Probability getPseudoProbability(List context) {
> + int maxCoverage = 0;
> + int coverage = 0;
> + long firstWordCount = lm.getCount(context.get(0));
> + maxCoverage++;

Off topic: This variable could be initalized to 1 directly on the first line of
the function.


> + if (firstWordCount > 0) {
> +   coverage++;
> + }
> + // chain rule:

The chain rule is

P(A,B,C,...) = P(A) * P(B|A) * P(C|A, B) * ...

So the line below would be P(A)

> + double p = (double) (firstWordCount + 1) / (totalTokenCount + 1);

which looks okay but (assuming you are going for Laplace-Add-one
smoothing) you would have to not add + 1 to "totalTokenCount" but the
vocabulary size for the n-gram model (== all unique n-grams which for
unigrams would mean all unique "syntactic words").

Another smoothing approach *may* work better.


> + debug("P for %s: %.20f (%d)\n", context.get(0), p, firstWordCount);
> + for (int i = 2; i <= context.size(); i++) {
> +   List subList = context.subList(0, i);
> +   long phraseCount = lm.getCount(subList);
> +   double thisP = (double) (phraseCount + 1) / (firstWordCount + 1);

This would be the place the conditional probabilities within the chain
are calculated. A conditional probability can be calculated as follows.

P(B|A) = P(A,B)/P(A)

Using the conditional probability of token "is" given token "this"
as an example it would look like this.

P("is"|"this") = P("this","is")/P("this")

where

P("this","is") = C("this is")/C(all 2-grams)

( C() denotes the count of the argument )

so I would have expected something like

+ double thisP = (double) ((Ngramcount + 1) / (countofallNgrams +
countofalluniqueNgrams)) / (countofN-1grams + countofalluniqueN-1grams);

Please note that one would have to adjust the n-gram-dependent counts
for different ns as n gets larger.

References:
* http://web.mit.edu/6.863/www/fall2012/readings/ngrampages.pdf
* "Foundations of Statistical Natural Language Processing" by Christopher D. 
Manning, Hinrich Schütze: 42f, 197, 202

--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: ignoring certain tokens in rules

2016-05-06 Thread Jaume Ortolà i Font
Hi,

In fact, the problem is a bit more complicated than I expected because the
disambiguation rules also need to ignore the tokens with quotation marks.
So it would be necessary to add a lot of  everywhere and
it would probably be unmanageable.

A more general solution:
- In AnalyzedSentece remove tokens containing quotation marks only
in getTokensWithoutWhitespace().
- Add two fields to AnalyzedTokenReadings: leftQuotationMark,
rightQuotationMark, which contain the characters adjacent to the word
(none, one side or both sides).
- Run everything as usually with the new
getTokensWithoutWhitespace (disambiguation, grammar rules, etc.).
- Retrieve leftQuotationMark, rightQuotationMark when necessary, for
example in suggestions (i.e.).

Possible difficulties:
- GenericUnpairedBracketsRule must be modified accordingly.
- Perhaps some grammar and disambiguation rules should know about the
quotation marks and new attributes could be necessary (similar to
spacebefore="yes/no").
- Whitespaces in French.
- Other unexpected troubles.

Do you think this is a good approach?

I can try to implement it, but I am not really sure if it is worthwhile
because the problems it solves are relatively rare.

Regards,
Jaume Ortolà



2016-05-05 16:22 GMT+02:00 Jaume Ortolà i Font :

> Hi,
>
> I think Marcin talked about this idea some time ago.
>
> Sometimes tokens like quotations (or other characters) should be ignored
> in some rules. That is, the sentence should be checked as if this token is
> not present. Any idea about how could it be implemented?
>
> Alternatively, tokens like this one should be added to the the patterns:
>
> [“‘”«"']
>
> I would need to modify a few dozen rules. But perhaps this is the best
> solution: it gives more control about the rule, the suggestions, possible
> false alarms, and so on. what do you think?
>
> Regards,
> Jaume Ortolà
>
>
>
--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Please help test Chrome extension beta

2016-05-06 Thread Daniel Naber
Hi,

please help test a new version of the Chrome extension:
http://forum.languagetool.org/t/please-help-test-chrome-extension-beta/847

Please also do this if Chrome is not your default browser, because soon 
Firefox will be compatible with Chrome extensions and then this new 
version will become the LT add-on for Firefox, too (Firefox 48 will 
probably be the first version to be compatible enough).

Regards
  Daniel


--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel