My vote is thumbs down: -1
I am only involved in Nutch 2.0 and that would be put the back burner...
Please read these articles if you struggle with using Nutch 2.0, and give
feedback so that we can improve the doc/code/architecture.
Nutch 2.0 (trunk)
I'm glad to hear that there at least 2 people in the community that do
business in their field and proudly use a Nutch-based crawler together with
Cassandra to store the data through Gora. That would not have been possible
with Nutch 1.x version.
Maybe this has been widely discussed already. IMOO, crawl segments are
hard-to-maintain and easily lost. If you want to do that HDFS is what you
are looking for. Even Yahoo has given up and is now using Microsoft updated
crawl information in order to implement search. They use HBase which is, by
the way, Nutch 2.0 compatible.
Take at look:
http://developer.yahoo.com/events/hadoopsummit2011/agenda.html#22 (sorry I
don't think any video of the summit is available yet, not sure why)
On Mon, Sep 19, 2011 at 1:05 AM, Julien Nioche <
Here is my vote :
> +1 : Shelve 2.0 and move 1.4 to trunk
> On 18 September 2011 10:21, Julien Nioche
>> Following the discussions  on the dev-list about the future of Nutch
>> 2.0, I would like to call for a vote on moving Nutch 2.0 from the trunk to a
>> separate branch, promote 1.4 to trunk and consider 2.0 as unmaintained. The
>> arguments for / against can be found in the thread I mentioned.
>> The vote is open for the next 72 hours.
>> [ ] +1 : Shelve 2.0 and move 1.4 to trunk
>>  0 : No opinion
>>  -1 : Bad idea. Please give justification.
>> *Open Source Solutions for Text Engineering
> *Open Source Solutions for Text Engineering