Hi all, Thanks for the replies.
I'm using hadoop-0.18.3. We are actually indexing clueweb'09 dataset in a much-similar-to-Nutch way. Reduce creates a document (similar to lucene document which implements Writable) by adding the fields generated by Map. This document is put into output collector (of Reduce). This document is indexed and copied to local files using Nutch's org.apache.nutch.indexer.IndexerOutputFormat (used in setOutputFormat()) which in turn uses org.apache.nutch.indexer.lucene.LuceneWriter to index this document output by Reduce, on local file system and move the created index to HDFS. I'm not getting where to place Reporter.incrCounter() to reset the counter. Indexing is being done at the end of Reduce. REGARDING HADOOP MAILING LIST: Whenever I try to subscribe to the Hadoop mailing list, I get an error saying: == Hi. This is the qmail-send program at apache.org . I'm afraid I wasn't able to deliver your message to the following addresses. This is a permanent error; I've given up. Sorry it didn't work out. <[email protected]>: This mailing list has moved to common-user at hadoop.apache.org . == Strange thing is I'm able to send mail to the list, but I'm unable to receive any mails on the mailing list. I didn't even receive replies to these mails, I found them on Google! What could be the problem? Thanks in advance, Prashant Ullegaddi, Search and Information Extraction Lab, IIIT-Hyderabad, INDIA.
