Alexis,

I've spent some time working on this as well. I've just put together a
> blog entry addressing the issues I ran into. See
> http://techvineyard.blogspot.com/2010/12/build-nutch-20.html
>

This is a great howto for Nutch 2.0. Feel free to link to it from the Wiki,
this could be useful to others.
I don't remember seeing any of the issues you mentioned in the Nutch JIRA.
If you think something is a bug, why not reporting it? The same applies to
the fixes you suggested for GORA.


>
> In a nutchsell, I changed three pieces in Gora and Nutch code:
> - flush the datastore regularly in the Hadoop RecordWriter (in
> GoraOutputFormat)
> - wait for Hadoop job completion in the Fetcher job
> - ensure that the content length limit is not being exceeded in
> protocol-http plugin (only for MySQL datastore)
>

the content length limit issue can also be fixed by modifying the gora
schema for the MySQL backend. It would make sense to allow larger values by
default. Could you please open a JIRA for this?

Thanks

Julien

-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com

Reply via email to