[
https://issues.apache.org/jira/browse/NUTCH-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13699362#comment-13699362
]
Markus Jelsma commented on NUTCH-1599:
--------------------------------------
nice! thanks
> Obtain consensus on new description of Nutch
> --------------------------------------------
>
> Key: NUTCH-1599
> URL: https://issues.apache.org/jira/browse/NUTCH-1599
> Project: Nutch
> Issue Type: Improvement
> Components: documentation
> Reporter: Lewis John McGibbney
> Assignee: Lewis John McGibbney
> Fix For: 2.3, 1.8
>
>
> As we seem to be sustaining pushes and maintenance (touch wood) of two
> branches, I think it is about time we agreed on a more accurate description
> of what Nutch actually is.
> We currently have (taken directly from our site)
> {code:xml}
> Apache Nutch is an open source web-search software project. Stemming from
> Apache Lucene, it now builds on Apache Solr adding web-specifics, such as a
> crawler, a link-graph database and parsing support handled by Apache Tika for
> HTML and and array other document formats.
> Nutch can run on a single machine, but gains a lot of its strength from
> running in a Hadoop cluster
> The system can be enhanced (eg other document formats can be parsed) using a
> highly flexible, easily extensible and thoroughly maintained plugin
> infrastructure.
> {code}
> I suggest/propose something along the lines of
> {code:xml}
> Apache Nutch is an open source web-search software project. Stemming from
> Apache Lucene, the community now develops and maintains two branches:
> * 1.x; description of 1.x here
> * 2.x; description of 2.x here
> Both branches add web-specifics, such as a crawler, a link-graph database and
> parsing support handled by Apache Tika for HTML and anarray other document
> formats.
> Nutch can run on a single machine, but gains a lot of its strength from
> running in a Hadoop cluster
> The system can be enhanced (eg other document formats can be parsed) using a
> highly flexible, easily extensible and thoroughly maintained plugin
> infrastructure.
> {code}
> Any thoughts?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira