[ https://issues.apache.org/jira/browse/NUTCH-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16172984#comment-16172984 ]
Karl Richter commented on NUTCH-2425: ------------------------------------- Thank for you feedback. > Everyone is welcome to improve the documentation on the Wiki. Please, create > an account (https://wiki.apache.org/nutch/FrontPage?action=newaccount) and > send us your username over the mailing lists. Thanks! That's good, but I have no idea about the ideal proceedings and I'm very afraid to invest a lot of time in something a dev can do much quicker and finally find out that there's a much easier way that I don't know. This is a generic problem with documentation and guides. I hope you understand. > Btw., urls can be a file or directory: > > bin/nutch inject .../crawldb urls/seeds.txt injects all URLs from this > file > bin/nutch inject .../crawldb urls/ injects all URLs from urls/seeds.txt > but also other files found in urls/ I was to hasty on that... > Update GettingNutchRunningWithUbuntu wiki article > ------------------------------------------------- > > Key: NUTCH-2425 > URL: https://issues.apache.org/jira/browse/NUTCH-2425 > Project: Nutch > Issue Type: Task > Reporter: Karl Richter > > https://wiki.apache.org/nutch/GettingNutchRunningWithUbuntu contains some > errors (e.g. `echo 'http://lucene.apache.org/nutch/' > urls` where `urls` is > a directory) and obsolete parts (`conf/crawl-urlfilter.txt` is > `conf/regex-urlfilter.txt` in 2.x) and thus appear to be tested well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)