As far as I can see they are separate. The tutorials are clearly under different subsections, and the Nutch 2.x docs have their own section as well.
I made quick review of documentation. Details below:

http://wiki.apache.org/nutch/CommandLineOptions

   Webgraph classes - Present in the docs but do not exist in Nuch 2.
   Other Classes CrawlDBScanner - Present in the docs but do not exist
   in Nuch 2.


http://wiki.apache.org/nutch/NutchConfigurationFiles

   Mentioned files which doesn't exists in Nutch 2:
   hadoop-site.xml
   job.xml
   mapred-default.xml


http://wiki.apache.org/nutch/IndexStructure

   Refers to index-extra and index-static plugins but they aren't
   available at Nuch 2


http://wiki.apache.org/nutch/SetupProxyForNutch

   Configure Nutch (Nutch 1.3)
   Title suggest that this section refers to Nutch 1. x only but I
   think this prop also exists in Nutch 2.


Nutch 2.x:

   http://wiki.apache.org/nutch/Nutch2Architecture
   As mentioned in headline document is outdated. Maybe should be removed?

   http://wiki.apache.org/nutch/NewScoring - invalid in Nutch 2.1
   http://wiki.apache.org/nutch/NewScoringIndexingExample - invalid in
   Nutch 2.1

   Now I see that in Nutch 2.x section some pages are equivalents of
   pages in "Configuration" section.
   Does this mean that the content in "Configuration" refers only to
   Nuch 1.x? I'm not sure because some pages from "Configuration" do
   not appear in Nutch 2.x but seems valid for branch 2.


Release Report - http://s.apache.org/PGa
I did not notice this. Division into bugs and Improvements is very nice

    If new user looks at Nuch he will not check the changelog but
documentation.

Is this your opinion or are you commenting from a wider audiences perspective?
Only my personal opinion and experience.

    I think the new user should be provided with clear information
about which branch to choose.

I agree with this. This is why the lists exist. You can ask questions. You can also read some archives. It takes a minimal well spent investment of time to dig up what other have asked many many times. Don't get me wrong, I am all for informing people about the software... however I am not in the immediate position to write a decent quality book on Nutch which would do the community and software justice. If you are then please do.
If I'll get enough mana with Nutch I will try :)

What is more, the doc should be divided in branch 1 and 2.

Please see the table of contents on the wiki. Please also see my comments above.
As I described above, in my opinion docs are mixed.

    Pages could link together, but there should be a clean branch tree
    in the docs. As like in source code. You do not mix packages from
    two branches but you keep them in separated repos.


ditto
Later I will try to propose some documentation structure on users mailing list.

    I don't think that for bugs documentation is essential. Only for
    new features or refactoring. It doesn't have to be big document.
It just has to exist.

But what happens if fixing a bug changes functionality? Then what?
If feature is documented on the wiki and while fixing it developer change its behaviour, the doc should be updated or at least marked as outdated. How else could it be done? If I read doc I should check its last modification date and compare it with issues dates related to it? Essential here is good wiki structure. It should enable developer quickly identify which pages are related to the issue.

    I know that sometimes developers don't have time to create
    documentation. But in such case they should create a new task for
    such doc. Otherwise nobody knows that doc is missing and cannot help.


Not true. All you need to do is request Karma for the project wiki and you can contribute to whatever you feel is missing. I don't take this argument sorry.
Is there any wiki todo list? I know that some pages are marked to be cleaned up. But what with pages that should be created from scratch? Do you think jira documentation component could be used for that? (https://issues.apache.org/jira/issues?jql=project%20%3D%20NUTCH%20AND%20component%20%3D%20documentation)
Maybe we should mention about this path on wiki?

    I am not saying that confluence is best for this project. But in
    my opinion Nutch docs should be moved to some community/social
    solutions. It would be great if it enables comments and pull
    requests (like on github) to improve it.


AFAICT the wiki we currently have IS community oriented. Anyone over the years that has wished to add/edit has been granted Karma to do so. Are you really saying that enabling pull request via Github is a better way than simply granting someone Karma to edit a page as they wish?
I think yes. I believe that such approach is useful for people who encountered problems with some specific part of Nutch but do not want to contribute continuously. I'm thinking about simplifying such scenario "Hey this doc is wrong. I will send pull request (jira issue) with fixes". I my opinion in this kind of situations, people will not want to subscribe mailing list and ask about access to doc editing. This also creates the possibility to review such request.

Honestly I haven't seen anything from your commentary which would suggest benefits for Nutch as a whole... I am trying NOT to be pessimistic, but I am just struggling to see your point here. If the wiki is outdated... then we should update it. Not change to another solution just so we can receive pull requests for documentation. There is an argument to make it as easy as possible to contribute documentation to Nutch. However as far as I can see, there are not crowds of people rushing to contribute. Please don't take these comments negatively. I am behind any motion to make documentation better. I just don't see eye-to-eye with some of your points.
I believe that some people would like to contribute some small pieces of the doc, but if the process is too complicated they are too lazy to do it. It is normal for our kind. We were lazy so we have created computers and then crawlers :)


Reply via email to