Hi Mateusz,

On Mon, Mar 3, 2014 at 10:24 AM, Mateusz Zakarczemny <
mateusz.zakarcze...@up2data.pl> wrote:

>
> I made quick review of documentation. Details below:
>
> http://wiki.apache.org/nutch/CommandLineOptions
>
> Webgraph classes - Present in the docs but do not exist in Nuch 2.
> Other Classes CrawlDBScanner - Present in the docs but do not exist in
> Nuch 2.
>
> i take it you saw the HUGE table which differentiates between which tools
are available for which Nutch versions? This helps greatly in identifying
that the above two tools are not present in 2.X. Simple. Unless someone
writes the implementations then it is clear that they are not included in
2.X.



>
> http://wiki.apache.org/nutch/NutchConfigurationFiles
>
> Mentioned files which doesn't exists in Nutch 2:
> hadoop-site.xml
> job.xml
> mapred-default.xml
>
> I've just fixed this now.


>
> http://wiki.apache.org/nutch/IndexStructure
>
> Refers to index-extra and index-static plugins but they aren't available
> at Nuch 2
>
> I've improved this page however the specifics are NOT accurate yet. There
is more work required here.


> http://wiki.apache.org/nutch/SetupProxyForNutch
>
> Configure Nutch (Nutch 1.3)
> Title suggest that this section refers to Nutch 1. x only but I think this
> prop also exists in Nutch 2.
>
>
FIXED

>
> Nutch 2.x:
>
> http://wiki.apache.org/nutch/Nutch2Architecture
> As mentioned in headline document is outdated. Maybe should be removed?
>
>
Yeah this is painfully old and severely out of date. It is made clear that
it is OOD though.

>
> http://wiki.apache.org/nutch/NewScoring - invalid in Nutch 2.1
>
>
FIXED

> http://wiki.apache.org/nutch/NewScoringIndexingExample - invalid in Nutch
> 2.1
>
> FIXED

>
> Now I see that in Nutch 2.x section some pages are equivalents of pages in
> "Configuration" section.
> Does this mean that the content in "Configuration" refers only to Nuch
> 1.x? I'm not sure because some pages from "Configuration" do not appear in
> Nutch 2.x but seems valid for branch 2.
>
>
Can you make it explicit what you mean here? I don't fully understand.
Thank you.

>  If I'll get enough mana with Nutch I will try :)
>

Great it would be very useful to have someone overhaul the docs again. It
really would :)

>
>
> Later I will try to propose some documentation structure on users mailing
> list.
>

Great


>
> If feature is documented on the wiki and while fixing it developer change
> its behaviour, the doc should be updated or at least marked as outdated.
>

Then we either have documentation which is unmaintainable... or simply
unmaintained!!!


> How else could it be done? If I read doc I should check its last
> modification date and compare it with issues dates related to it?
> Essential here is good wiki structure. It should enable developer quickly
> identify which pages are related to the issue.
>

The wiki is pretty well structured. You seem to have benn able to find many
flaws so presumably you don't struggle to find content, you just are not
quite happy with the relevance and accuracy of the content you find! This
is why we always need people to spot the flaws in the documentation and
point them out to us. Thanks again for your attention to detail.


>
> Is there any wiki todo list? I know that some pages are marked to be
> cleaned up.
>

Yes this is where we can start.


> But what with pages that should be created from scratch?
>

Well this depends entirely on who wishes to write a new page from scratch.


> Do you think jira documentation component could be used for that? (
> https://issues.apache.org/jira/issues?jql=project%20%3D%20NUTCH%20AND%20component%20%3D%20documentation
> )
> Maybe we should mention about this path on wiki?
>

Great idea. Very good idea. I've just added this to the top of the
FrontPage.


>
> I think yes. I believe that such approach is useful for people who
> encountered problems with some specific part of Nutch but do not want to
> contribute continuously.
> I'm thinking about simplifying such scenario "Hey this doc is wrong. I
> will send pull request (jira issue) with fixes". I my opinion in this kind
> of situations, people will not want to subscribe mailing list and ask about
> access to doc editing. This also creates the possibility to review such
> request.
>

The process of asking for Karma take less than 5 minutes.


>
> I believe that some people would like to contribute some small pieces of
> the doc, but if the process is too complicated they are too lazy to do it.
> It is normal for our kind. We were lazy so we have created computers and
> then crawlers :)
>

+1 :)




-- 
*Lewis*

Reply via email to