[
https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma reopened NUTCH-1325:
----------------------------------
Hi Tejas, can you check this out before 1.8? I cannot seem to get it to work
properly.
{code}
markus@midas:~/projects/apache/nutch/trunk/runtime/local$ bin/nutch hostdb
-Dplugin.includes="urlfilter-(domain)" crawl/hostdb -crawldb crawl/crawldb/
-checkAll
HostDb: crawldb: crawl/crawldb
HostDb: checking all hosts
HostDb: starting at 2014-03-04 14:02:45
http://.../: existing_unknown_host Version: 1
Homepage url:
Score: 0.0
Last check: 2014-03-04 14:02:47
Total records: 0
Unfetched: 0
Fetched: 0
Gone: 0
Perm redirect: 0
Temp redirect: 0
Not modified: 0
Total failures: 1
DNS failures: 1
Connection failures: 0
java.lang.NullPointerException
at
org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1030)
at
org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1072)
at
org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat$1.write(SequenceFileOutputFormat.java:74)
at
org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:586)
at
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at
org.apache.nutch.util.hostdb.HostDb$HostDbReducer$ResolverThread.run(HostDb.java:469)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
{code}
> HostDB for Nutch
> ----------------
>
> Key: NUTCH-1325
> URL: https://issues.apache.org/jira/browse/NUTCH-1325
> Project: Nutch
> Issue Type: New Feature
> Reporter: Markus Jelsma
> Assignee: Tejas Patil
> Fix For: 1.8
>
> Attachments: NUTCH-1325-1.6-1.patch, NUTCH-1325-trunk-v3.patch,
> NUTCH-1325-trunk-v4.patch, NUTCH-1325.trunk.v2.path
>
>
> A HostDB for Nutch and associated tools to create and read a database
> containing information on hosts.
--
This message was sent by Atlassian JIRA
(v6.2#6252)