[
https://issues.apache.org/jira/browse/NUTCH-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16172984#comment-16172984
]
Karl Richter commented on NUTCH-2425:
-------------------------------------
Thank for you feedback.
> Everyone is welcome to improve the documentation on the Wiki. Please, create
> an account (https://wiki.apache.org/nutch/FrontPage?action=newaccount) and
> send us your username over the mailing lists. Thanks!
That's good, but I have no idea about the ideal proceedings and I'm very afraid
to invest a lot of time in something a dev can do much quicker and finally find
out that there's a much easier way that I don't know. This is a generic problem
with documentation and guides. I hope you understand.
> Btw., urls can be a file or directory:
>
> bin/nutch inject .../crawldb urls/seeds.txt injects all URLs from this
> file
> bin/nutch inject .../crawldb urls/ injects all URLs from urls/seeds.txt
> but also other files found in urls/
I was to hasty on that...
> Update GettingNutchRunningWithUbuntu wiki article
> -------------------------------------------------
>
> Key: NUTCH-2425
> URL: https://issues.apache.org/jira/browse/NUTCH-2425
> Project: Nutch
> Issue Type: Task
> Reporter: Karl Richter
>
> https://wiki.apache.org/nutch/GettingNutchRunningWithUbuntu contains some
> errors (e.g. `echo 'http://lucene.apache.org/nutch/' > urls` where `urls` is
> a directory) and obsolete parts (`conf/crawl-urlfilter.txt` is
> `conf/regex-urlfilter.txt` in 2.x) and thus appear to be tested well.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)