[ 
https://issues.apache.org/jira/browse/NUTCH-881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081639#comment-13081639
 ] 

Lewis John McGibbney commented on NUTCH-881:
--------------------------------------------

In Nutch trunk we currently only have the wiki as a repository for any Nutch 
2.0 information. Is this satisfactory?

As far as I can tell, the documentation for Gora_trunk is produced using Apache 
Forrest. I am reasonably familiar with using Forrest and it would be a great 
benefit, as well as lessening the burden upon mailing lists, if we could 
maintain a clean distribution of documentation bundled nicely into a 
/trunk/docs or/and branch-1.4/docs directory from now on and for all future 
official releases.

I think the only addition to the documentation we require on the website is a 
formal tutorial (available as part of the Apache Nutch website), which we need 
to add to /site resources and which we could maintain and direct users to as a 
one stop resource for Nutch branch/tags, then similarly a separate resource for 
trunk.

ith specific reference to Nutch Trunk, in comparison on the Gora team they have 
provided a quick-start guide followed by a more in depth tutorial, which in our 
case we could apply to both branch-1.4 and 2.0 trunk. The quick-start guide 
would only show users how to get trunk up and running, then the formal tutorial 
would provide in-depth documentation on completing a crawl with either Nutch 
1.4 or trunk 2.0. Does this sound reasonable?

Andrzej provided some good comments in the correspondence on NUTCH-881 which 
should be addressed within any comprehensive documentation. I am very happy, 
and pretty keen to get this issue resolved but I think we need to agree on a 
specific tasks which need to be addressed, basically laying the path for 
everything this issue encompasses.

> Good quality documentation for Nutch
> ------------------------------------
>
>                 Key: NUTCH-881
>                 URL: https://issues.apache.org/jira/browse/NUTCH-881
>             Project: Nutch
>          Issue Type: Improvement
>          Components: documentation
>    Affects Versions: 2.0
>            Reporter: Andrzej Bialecki 
>
> This is, and has been, a long standing request from Nutch users. This becomes 
> an acute need as we redesign Nutch 2.0, because the collective knowledge and 
> the Wiki will no longer be useful without massive amount of editing.
> IMHO the reference documentation should be in SVN, and not on the Wiki - the 
> Wiki is good for casual information and recipes but I think it's too messy 
> and not reliable enough as a reference.
> I propose to start with the following:
>  1. let's decide on the format of the docs. Each format has its own pros and 
> cons:
>   * HTML: easy to work with, but formatting may be messy unless we edit it by 
> hand, at which point it's no longer so easy... Good toolchains to convert to 
> other formats, but limited expressiveness of larger structures (e.g. book, 
> chapters, TOC, multi-column layouts, etc).
>   * Docbook: learning curve is higher, but not insurmountable... Naturally 
> yields very good structure. Figures/diagrams may be problematic - different 
> renderers (html, pdf) like to treat the scaling and placing somewhat 
> differently.
>   * Wiki-style (Confluence or TWiki): easy to use, but limited control over 
> larger structures. Maven Doxia can format cwiki, twiki, and a host of other 
> formats to e.g. html and pdf.
>   * other?
>  2. start documenting the main tools and the main APIs (e.g. the plugins and 
> all the extension points). We can of course reuse material from the Wiki and 
> from various presentations (e.g. the ApacheCon slides).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to