[ 
https://issues.apache.org/jira/browse/NUTCH-881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899776#action_12899776
 ] 

Alex McLintock commented on NUTCH-881:
--------------------------------------

So what is new in Nutch 2.0 which doesn't appear in Nutch 1.x ?

Gora is the main thing which comes to mind. 

How do the config files differ?
How does Nutch's use of Hadoop differ? 
How do the command lines differ? (Presumably you need different command lines 
to say *where* to store the crawldb, right?)

anything else?

> Good quality documentation for Nutch
> ------------------------------------
>
>                 Key: NUTCH-881
>                 URL: https://issues.apache.org/jira/browse/NUTCH-881
>             Project: Nutch
>          Issue Type: Improvement
>          Components: documentation
>    Affects Versions: 2.0
>            Reporter: Andrzej Bialecki 
>
> This is, and has been, a long standing request from Nutch users. This becomes 
> an acute need as we redesign Nutch 2.0, because the collective knowledge and 
> the Wiki will no longer be useful without massive amount of editing.
> IMHO the reference documentation should be in SVN, and not on the Wiki - the 
> Wiki is good for casual information and recipes but I think it's too messy 
> and not reliable enough as a reference.
> I propose to start with the following:
>  1. let's decide on the format of the docs. Each format has its own pros and 
> cons:
>   * HTML: easy to work with, but formatting may be messy unless we edit it by 
> hand, at which point it's no longer so easy... Good toolchains to convert to 
> other formats, but limited expressiveness of larger structures (e.g. book, 
> chapters, TOC, multi-column layouts, etc).
>   * Docbook: learning curve is higher, but not insurmountable... Naturally 
> yields very good structure. Figures/diagrams may be problematic - different 
> renderers (html, pdf) like to treat the scaling and placing somewhat 
> differently.
>   * Wiki-style (Confluence or TWiki): easy to use, but limited control over 
> larger structures. Maven Doxia can format cwiki, twiki, and a host of other 
> formats to e.g. html and pdf.
>   * other?
>  2. start documenting the main tools and the main APIs (e.g. the plugins and 
> all the extension points). We can of course reuse material from the Wiki and 
> from various presentations (e.g. the ApacheCon slides).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to