Marcin Miłkowski wrote thus at 06:04 PM 10-09-13:
>W dniu 2013-09-10 10:06, Kumara Bhikkhu pisze:
> > Dear friends,
> >
> > Rules are tested against an amply large sample of Wikipedia articles.
> > That's great. But it also limits the testing on encyclopedic language
> > only. As some errors tend to happen more in conversational writing, I
> > wonder if it's possible to extend the sample to such texts.
>
>Yes, but we need to have access to a fairly big corpus. I know there was
>a freely available blog corpus but only for English.

Agree. With blogs, we get a more casual language. 
Do they have to be under free license. Large 
dictionary publishers build their corpora from 
commercial literature too, right?

>But in principle, we could have additional corpora added for our testing
>purposes.

If adding to the existing one might make the 
process too lengthy, I suggest having separate 
tests for different corpora, like  a button for 
Wikipedia and another for Open Library, or the blogsphere.

kb 


------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. Consolidate legacy IT systems to a single system of record for IT
2. Standardize and globalize service processes across IT
3. Implement zero-touch automation to replace manual, redundant tasks
http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/ostg.clktrk
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to