Big in size, etc
Hope this is not too greedy. rule id=BIG_IN_SIZE name=Big in size, etc pattern token postag_regexp=yes postag=JJ|JJR|JJSexception regexp=yessimilar|alone/exceptionexception postag=VBN//token tokenin/token token regexp=yessize|duration|color|colour|number|shape|height|nature|length|weight/token /pattern messageA more concise phrase may lose no meaning and sound more powerful./message suggestion\1/suggestion shortPossible redundancy/short example type=correctThe man is big./example example type=correctHis speech was briefest./example example type=correctHe's absorbed in thought./example example correction=big type=incorrectThe man is markerbig in size/marker./example example correction=briefest type=incorrectHis speech was markerbriefest in duration/marker./example example correction=redder type=incorrectMy car is markerredder in color/marker./example example correction=few type=incorrectHer friends were markerfew in number/marker./example /rule -- WatchGuard Dimension instantly turns raw network data into actionable security intelligence. It gives you real-time visual feedback on key security issues and trends. Skip the complicated setup - simply import a virtual appliance and go from zero to informed in seconds. http://pubads.g.doubleclick.net/gampad/clk?id=123612991iu=/4140/ostg.clktrk___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel
They got it right; they got it wrong.
False alarm: They got it right; they got it wrong. right and wrong are wrongly flagged. kb -- WatchGuard Dimension instantly turns raw network data into actionable security intelligence. It gives you real-time visual feedback on key security issues and trends. Skip the complicated setup - simply import a virtual appliance and go from zero to informed in seconds. http://pubads.g.doubleclick.net/gampad/clk?id=123612991iu=/4140/ostg.clktrk ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel
Re: How to use the lemmatizer
Hi, thanks for the feedback. I'm really only interested in getting a convenient access to the dictionaries, so that I can use them for lemmatization. For this particular task, I'm not using any other functionality from LanguageTool, including grammatical rules. So here is what I do: - run a probabilistic POS tagger - feed the tokens of my text to the languagetool tagger to get all dictionary entries - find a match between the POS tag created by the probabilistic tagger and the returned dictionary entries - if there is a match, use the respective lemma Matching is the most annoying part, because the tagset used by the probabilistic tagger may not be the same as the one used in the LanguageTool dictionary. So now try three matching approaches: - checking if the POS tag from tagger and the one from dictionary are exactly the same? - checking if the POS tag from the tagger is the same as the first element of the dictionary tag (splitting by ':') - using mapping tables to map both, the tag from the POS tagger and the tag from the dictionary, to a coarse-grained scheme of word classes and see if they match there Seems to work quite ok. Cheers, -- Richard On 27.01.2014, at 22:38, Marcin Miłkowski list-addr...@wp.pl wrote: Hello, W dniu 2014-01-27 15:44, Richard Eckart de Castilho pisze: Hello everybody, I may be totally wrong, but I believe the lemmatizers in LanguageTool are implemented based on dictionaries. I suppose a dictionary entry would be made up of a form, a lemma, and a pos tag. Assuming this is correct, is there a simple way to do a lookup in such a dictionary? Also, is there a way to find out which tagsets are used by these dictionaries (or maybe there is even some standard in LanguageTool, e.g. verbs are always V and nouns are always N or something like that)? I would like a method that accepts an inflected form and a pos tag and that returns a single lemma. Currently, I am doing this, but it seems a bit awkward. ListAnalyzedTokenReadings rawTaggedTokens = lang.getTagger().tag(tokenText); AnalyzedSentence as = new AnalyzedSentence( rawTaggedTokens.toArray(new AnalyzedTokenReadings[rawTaggedTokens.size()])); as = lang.getDisambiguator().disambiguate(as); String best = getMostFrequentLemma(as.getTokens()[i]); In particular, I would like to use a different POS tagger. I have various statistical POS taggers at my disposal that produce a single POS per token - and that is what I want. The LanguageTool POS tagger produces multiple unranked POS tags per token. Beware that statistical POS taggers will necessarily obfuscate non-grammatical material, as they try to guess the correct tags. This makes them quite useless for writing rules. We've been there, tried that. I haven't yet found a decent English POS tagger, for example, that would be useful. Note however that if you have frequency info, you can add it to your tagger dictionary. And we indeed can do so using typing frequency lists, so you'd be able to assign the most frequent lemma if you need, I guess. The procedure is described here: http://wiki.languagetool.org/hunspell-support See under including frequency data. Regards, Marcin -- WatchGuard Dimension instantly turns raw network data into actionable security intelligence. It gives you real-time visual feedback on key security issues and trends. Skip the complicated setup - simply import a virtual appliance and go from zero to informed in seconds. http://pubads.g.doubleclick.net/gampad/clk?id=123612991iu=/4140/ostg.clktrk ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel
GSoC 2014
Hi, there's a Google Summer of Code again this year. If we decide to apply as a mentoring organization, the most important thing is to update this page: http://wiki.languagetool.org/missing-features Please update it anyway, add your ideas, your just post them here if you don't have a Wikidot account. So should we apply? The GSoC timeline is at http://www.google-melange.com/gsoc/events/google/gsoc2014. Regards Daniel -- WatchGuard Dimension instantly turns raw network data into actionable security intelligence. It gives you real-time visual feedback on key security issues and trends. Skip the complicated setup - simply import a virtual appliance and go from zero to informed in seconds. http://pubads.g.doubleclick.net/gampad/clk?id=123612991iu=/4140/ostg.clktrk ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel
Re: They got it right; they got it wrong.
W dniu 2014-01-29 10:50, Kumara Bhikkhu pisze: False alarm: They got it right; they got it wrong. right and wrong are wrongly flagged. Yes, I fixed it some time ago. I made really serious changes in disambiguation so please use the daily builds: https://www.languagetool.org/download/snapshots/?C=M;O=D Best, Marcin -- WatchGuard Dimension instantly turns raw network data into actionable security intelligence. It gives you real-time visual feedback on key security issues and trends. Skip the complicated setup - simply import a virtual appliance and go from zero to informed in seconds. http://pubads.g.doubleclick.net/gampad/clk?id=123612991iu=/4140/ostg.clktrk ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel
Re: Try this
W dniu 2014-01-29 03:56, Kumara Bhikkhu pisze: Marcin Miłkowski wrote thus at 08:06 PM 28-01-14: Try: token inflected=yesconduct/token instead. Thanks for the tip. I was wondering about it. I tried your rule using our rule editor: http://community.languagetool.org/ruleEditor/expert and we have lots of matches, but some of them are quite useless, for example: Mozart's Davide penitente (1785), his Piano Concerto KV 482 (1785), the Clarinet Quintet (1789) and the 40th Symphony (1788) had been premiered on the suggestion of Salieri, who supposedly conducted a performance of it in 1791. (wikipedia) It's relevant actually. It can be revised as who supposedly performed it in 1791. Unnecessary nominalisation is so common these days that most people put up with it to the point of ignoring it. I'm afraid you're wrong. Salieri couldn't perform the 40th Symphony by himself. He was conductor of the performance, so the only possible way to say this correctly is to say that he conducted a performance of the symphony rather that he performed it. It makes no sense to think that a single person performs a symphony! ;) Students often use it to make their writing sound more pompous, which is counterproductive. I agree, this is pompous in many cases. You can use the following queries on corpus.byu.edu corpora: conduct a|an|the|no [n*] of|into conduct a|an|the|no [n*] * of|into conduct a|an|the|no [n*] * * of|into conduct a|an|the|no [n*] * * * of|into Yes, I'm using that (minus the, which suggest that the noun refers to a specific thing mentioned earlier, thus the nominalisation is necessary). Yeah, you're right. As you can see there, we also should add relevant exceptions to your rule ('deal' - from a great deal, 'range', etc.). So could you please look at the corpora results again? Best, Marcin -- WatchGuard Dimension instantly turns raw network data into actionable security intelligence. It gives you real-time visual feedback on key security issues and trends. Skip the complicated setup - simply import a virtual appliance and go from zero to informed in seconds. http://pubads.g.doubleclick.net/gampad/clk?id=123612991iu=/4140/ostg.clktrk ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel
Re: Big in size, etc
W dniu 2014-01-29 10:35, Kumara Bhikkhu pisze: Hope this is not too greedy. rule id=BIG_IN_SIZE name=Big in size, etc pattern token postag_regexp=yes postag=JJ|JJR|JJSexception regexp=yessimilar|alone/exceptionexception postag=VBN//token tokenin/token token regexp=yessize|duration|color|colour|number|shape|height|nature|length|weight/token /pattern messageA more concise phrase may lose no meaning and sound more powerful./message suggestion\1/suggestion shortPossible redundancy/short example type=correctThe man is big./example example type=correctHis speech was briefest./example example type=correctHe's absorbed in thought./example example correction=big type=incorrectThe man is markerbig in size/marker./example example correction=briefest type=incorrectHis speech was markerbriefest in duration/marker./example example correction=redder type=incorrectMy car is markerredder in color/marker./example example correction=few type=incorrectHer friends were markerfew in number/marker./example /rule Thanks! Yes, this is much better but we also match: rich in color few in number possible in Nature, corresponding in number religious in nature efficient in shape autocratic in nature, equal in size beautiful in color, next in size Out of 16 matches on Brown corpus, the useful ones were brown in color,tawny in color, two in number. I added the adjectives above as additional exceptions and we'll see how it performs. I'm not sure about 'few'. We'll probably see more matches in the nightly diff today. Regards, Marcin -- WatchGuard Dimension instantly turns raw network data into actionable security intelligence. It gives you real-time visual feedback on key security issues and trends. Skip the complicated setup - simply import a virtual appliance and go from zero to informed in seconds. http://pubads.g.doubleclick.net/gampad/clk?id=123612991iu=/4140/ostg.clktrk ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel
homepage usability test
Hi, I ran a usability test on our homepage using http://rapidusertests.com. They let users use the website and record that as a video. It's in German, but if you're interested in the results, let me know and I'll send you a link. Users' complaints/remarks were: -lack of an About page -errors are lost when switching the full-screen mode and back -you need to re-check the text after correcting an error manually -sometimes they don't get that the demo text is a demo and thus contains errors on purpose Regards Daniel -- WatchGuard Dimension instantly turns raw network data into actionable security intelligence. It gives you real-time visual feedback on key security issues and trends. Skip the complicated setup - simply import a virtual appliance and go from zero to informed in seconds. http://pubads.g.doubleclick.net/gampad/clk?id=123612991iu=/4140/ostg.clktrk ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel
Re: Big in size, etc
Of what you listed, these are rightly flagged: * few in number * religious in nature * autocratic in nature, Removing the 2nd and 3rd token wouldn't cause any lost of meaning. kb Marcin MiÅkowski wrote thus at 01:33 AM 30-01-14: W dniu 2014-01-29 10:35, Kumara Bhikkhu pisze: Hope this is not too greedy. rule id=BIG_IN_SIZE name=Big in size, etc pattern token postag_regexp=yes postag=JJ|JJR|JJSexception regexp=yessimilar|alone/exceptionexception postag=VBN//token tokenin/token token regexp=yessize|duration|color|colour|number|shape|height|nature|length|weight/token /pattern messageA more concise phrase may lose no meaning and sound more powerful./message suggestion\1/suggestion shortPossible redundancy/short example type=correctThe man is big./example example type=correctHis speech was briefest./example example type=correctHe's absorbed in thought./example example correction=big type=incorrectThe man is markerbig in size/marker./example example correction=briefest type=incorrectHis speech was markerbriefest in duration/marker./example example correction=redder type=incorrectMy car is markerredder in color/marker./example example correction=few type=incorrectHer friends were markerfew in number/marker./example /rule Thanks! Yes, this is much better but we also match: rich in color few in number possible in Nature, corresponding in number religious in nature efficient in shape autocratic in nature, equal in size beautiful in color, next in size Out of 16 matches on Brown corpus, the useful ones were brown in color,tawny in color, two in number. I added the adjectives above as additional exceptions and we'll see how it performs. I'm not sure about 'few'. We'll probably see more matches in the nightly diff today. Regards, Marcin -- WatchGuard Dimension instantly turns raw network data into actionable security intelligence. It gives you real-time visual feedback on key security issues and trends. Skip the complicated setup - simply import a virtual appliance and go from zero to informed in seconds. http://pubads.g.doubleclick.net/gampad/clk?id=123612991iu=/4140/ostg.clktrk ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel -- WatchGuard Dimension instantly turns raw network data into actionable security intelligence. It gives you real-time visual feedback on key security issues and trends. Skip the complicated setup - simply import a virtual appliance and go from zero to informed in seconds. http://pubads.g.doubleclick.net/gampad/clk?id=123612991iu=/4140/ostg.clktrk___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel
Re: homepage usability test
Kumara Bhikkhu wrote thus at 12:52 PM 30-01-14: Daniel Naber wrote thus at 06:34 AM 30-01-14: -sometimes they don't get that the demo text is a demo and thus contains errors on purpose I too had the same experience, though not for the demo text on the old homepage. I think revising the text with Check this text too see an Or paste your own text here. would minimise the possibility. kb -- WatchGuard Dimension instantly turns raw network data into actionable security intelligence. It gives you real-time visual feedback on key security issues and trends. Skip the complicated setup - simply import a virtual appliance and go from zero to informed in seconds. http://pubads.g.doubleclick.net/gampad/clk?id=123612991iu=/4140/ostg.clktrk ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel