[ 
https://issues.apache.org/jira/browse/NUTCH-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16172984#comment-16172984
 ] 

Karl Richter commented on NUTCH-2425:
-------------------------------------

Thank for you feedback.

> Everyone is welcome to improve the documentation on the Wiki. Please, create 
> an account (https://wiki.apache.org/nutch/FrontPage?action=newaccount) and 
> send us your username over the mailing lists. Thanks!

That's good, but I have no idea about the ideal proceedings and I'm very afraid 
to invest a lot of time in something a dev can do much quicker and finally find 
out that there's a much easier way that I don't know. This is a generic problem 
with documentation and guides. I hope you understand.

> Btw., urls can be a file or directory:
>
>     bin/nutch inject .../crawldb urls/seeds.txt injects all URLs from this 
> file
>     bin/nutch inject .../crawldb urls/ injects all URLs from urls/seeds.txt 
> but also other files found in urls/

I was to hasty on that...

> Update GettingNutchRunningWithUbuntu wiki article
> -------------------------------------------------
>
>                 Key: NUTCH-2425
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2425
>             Project: Nutch
>          Issue Type: Task
>            Reporter: Karl Richter
>
> https://wiki.apache.org/nutch/GettingNutchRunningWithUbuntu contains some 
> errors (e.g. `echo 'http://lucene.apache.org/nutch/' > urls` where `urls` is 
> a directory) and obsolete parts (`conf/crawl-urlfilter.txt` is 
> `conf/regex-urlfilter.txt` in 2.x) and thus appear to be tested well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to