On Feb 25, 2010, at 3:41 PM, Jake Mannix wrote:

> On Thu, Feb 25, 2010 at 12:38 PM, Robin Anil <robin.a...@gmail.com> wrote:
> 
>> Whats the largest dataset available? BixoLabs ? Wikipedia(5 Mil
>> articles)...
>> I dont know anything public that is that big
>> 
> 
> 5 million articles, if you take all the 1,2,3,4, and 5-grams data out of it,
> you
> could easily hit more than 4B individual matrix entries.

Is this meaningful to actually do (combine the various sizes) as an experiment 
other than for sheer size?

Reply via email to