[
https://issues.apache.org/jira/browse/NUTCH-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney resolved NUTCH-1599.
-----------------------------------------
Resolution: Fixed
http://nutch.apache.org/#What+is+Apache+Nutch%3F
Please check it out and edit where you guys see fit.
Thanks for the input, this one is well overdue... actually overdue since we
pushed 1.3 back in the dark ages!!!!
> Obtain consensus on new description of Nutch
> --------------------------------------------
>
> Key: NUTCH-1599
> URL: https://issues.apache.org/jira/browse/NUTCH-1599
> Project: Nutch
> Issue Type: Improvement
> Components: documentation
> Reporter: Lewis John McGibbney
> Assignee: Lewis John McGibbney
> Fix For: 2.3, 1.8
>
>
> As we seem to be sustaining pushes and maintenance (touch wood) of two
> branches, I think it is about time we agreed on a more accurate description
> of what Nutch actually is.
> We currently have (taken directly from our site)
> {code:xml}
> Apache Nutch is an open source web-search software project. Stemming from
> Apache Lucene, it now builds on Apache Solr adding web-specifics, such as a
> crawler, a link-graph database and parsing support handled by Apache Tika for
> HTML and and array other document formats.
> Nutch can run on a single machine, but gains a lot of its strength from
> running in a Hadoop cluster
> The system can be enhanced (eg other document formats can be parsed) using a
> highly flexible, easily extensible and thoroughly maintained plugin
> infrastructure.
> {code}
> I suggest/propose something along the lines of
> {code:xml}
> Apache Nutch is an open source web-search software project. Stemming from
> Apache Lucene, the community now develops and maintains two branches:
> * 1.x; description of 1.x here
> * 2.x; description of 2.x here
> Both branches add web-specifics, such as a crawler, a link-graph database and
> parsing support handled by Apache Tika for HTML and anarray other document
> formats.
> Nutch can run on a single machine, but gains a lot of its strength from
> running in a Hadoop cluster
> The system can be enhanced (eg other document formats can be parsed) using a
> highly flexible, easily extensible and thoroughly maintained plugin
> infrastructure.
> {code}
> Any thoughts?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira