Marcin MiÅkowski wrote thus at 06:04 PM 10-09-13: >W dniu 2013-09-10 10:06, Kumara Bhikkhu pisze: > > Dear friends, > > > > Rules are tested against an amply large sample of Wikipedia articles. > > That's great. But it also limits the testing on encyclopedic language > > only. As some errors tend to happen more in conversational writing, I > > wonder if it's possible to extend the sample to such texts. > >Yes, but we need to have access to a fairly big corpus. I know there was >a freely available blog corpus but only for English.
Agree. With blogs, we get a more casual language. Do they have to be under free license. Large dictionary publishers build their corpora from commercial literature too, right? >But in principle, we could have additional corpora added for our testing >purposes. If adding to the existing one might make the process too lengthy, I suggest having separate tests for different corpora, like a button for Wikipedia and another for Open Library, or the blogsphere. kb ------------------------------------------------------------------------------ How ServiceNow helps IT people transform IT departments: 1. Consolidate legacy IT systems to a single system of record for IT 2. Standardize and globalize service processes across IT 3. Implement zero-touch automation to replace manual, redundant tasks http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/ostg.clktrk _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel