>>I need to start off with this project where we can find the ranking of >>controversial articles. Could anyone kindly help me how to start?
Check out the wikipedia "logging" dumps which contain the reasons for actions on page titles (including ip blocks and deletes) but without the bulk of the full text changes. e.g. http://download.wikimedia.org/enwiki/20090827/enwiki-20090827-pages-logging.xml.gz Once you get this in Lucene "Luke" can help you explore and pinpoint the key target pages for vandalism. Cheers, Mark ----- Original Message ---- From: Sahi <sahilkaus...@hotmail.com> To: java-user@lucene.apache.org Sent: Wednesday, 2 September, 2009 5:09:15 Subject: Deletion of words in articles of Wikipedia Hi, I'm new to this site. My question is: Articles in wikipedia can be edited by everyone and may or may not be accurate. If any contributor writes an article and then another contributor deletes certain content in that article would indicate that the article is controversial. I need to start off with this project where we can find the ranking of controversial articles. Could anyone kindly help me how to start? Thanks -- View this message in context: http://www.nabble.com/Deletion-of-words-in-articles-of-Wikipedia-tp25251378p25251378.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org