Joe,

Thanks for getting started on this work.

On Fri, Aug 13, 2010 at 8:38 AM, Joe Kumar <[email protected]> wrote:

>
> For wikipedia bayes example, I am assuming that we need to download data
> (like how we are doing for Twenty Newsgroup example). can someone plz
> reference me the link or the process of getting this data ?
>

see: http://en.wikipedia.org/wiki/Wikipedia:Database_download

The full link is:
http://download.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2

WikipediaXmlSplitter is capable of reading the bz2 format file directly.

Drew

Reply via email to