On Feb 25, 2010, at 3:41 PM, Jake Mannix wrote: > On Thu, Feb 25, 2010 at 12:38 PM, Robin Anil <robin.a...@gmail.com> wrote: > >> Whats the largest dataset available? BixoLabs ? Wikipedia(5 Mil >> articles)... >> I dont know anything public that is that big >> > > 5 million articles, if you take all the 1,2,3,4, and 5-grams data out of it, > you > could easily hit more than 4B individual matrix entries.
Is this meaningful to actually do (combine the various sizes) as an experiment other than for sheer size?