Joe, Thanks for getting started on this work.
On Fri, Aug 13, 2010 at 8:38 AM, Joe Kumar <[email protected]> wrote: > > For wikipedia bayes example, I am assuming that we need to download data > (like how we are doing for Twenty Newsgroup example). can someone plz > reference me the link or the process of getting this data ? > see: http://en.wikipedia.org/wiki/Wikipedia:Database_download The full link is: http://download.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2 WikipediaXmlSplitter is capable of reading the bz2 format file directly. Drew
