I have problems with running injector in nutch-1.4 on hadoop, same
command with nutch-1.3 works fine. As you can see, list of URLs is
loaded from hdfs correctly Map input records=66906 but no records are on
map ouput. Could it be some problems with broken filtering?
ponto:(crawler)runtime/deploy>bin/nutch inject /czcrawl/db /czcrawl/seeds
11/10/13 17:56:25 INFO crawl.Injector: Injector: starting at 2011-10-13
17:56:25
11/10/13 17:56:25 INFO crawl.Injector: Injector: crawlDb: /czcrawl/db
11/10/13 17:56:25 INFO crawl.Injector: Injector: urlDir: /czcrawl/seeds
11/10/13 17:56:25 INFO crawl.Injector: Injector: Converting injected
urls to crawl db entries.
11/10/13 17:56:28 INFO mapred.FileInputFormat: Total input paths to
process : 1
11/10/13 17:56:29 INFO mapred.JobClient: Running job: job_201110091645_0032
11/10/13 17:56:30 INFO mapred.JobClient: map 0% reduce 0%
11/10/13 17:56:52 INFO mapred.JobClient: map 50% reduce 0%
11/10/13 17:56:53 INFO mapred.JobClient: map 100% reduce 0%
11/10/13 17:57:05 INFO mapred.JobClient: map 100% reduce 100%
11/10/13 17:57:10 INFO mapred.JobClient: Job complete: job_201110091645_0032
11/10/13 17:57:10 INFO mapred.JobClient: Counters: 27
11/10/13 17:57:10 INFO mapred.JobClient: Job Counters
11/10/13 17:57:10 INFO mapred.JobClient: Launched reduce tasks=1
11/10/13 17:57:10 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=20455
11/10/13 17:57:10 INFO mapred.JobClient: Total time spent by all
reduces waiting after reserving slots (ms)=0
11/10/13 17:57:10 INFO mapred.JobClient: Total time spent by all
maps waiting after reserving slots (ms)=0
11/10/13 17:57:10 INFO mapred.JobClient: Rack-local map tasks=1
11/10/13 17:57:10 INFO mapred.JobClient: Launched map tasks=2
11/10/13 17:57:10 INFO mapred.JobClient: Data-local map tasks=1
11/10/13 17:57:10 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=10356
11/10/13 17:57:10 INFO mapred.JobClient: File Input Format Counters
11/10/13 17:57:10 INFO mapred.JobClient: Bytes Read=1283144
11/10/13 17:57:10 INFO mapred.JobClient: File Output Format Counters
11/10/13 17:57:10 INFO mapred.JobClient: Bytes Written=86
11/10/13 17:57:10 INFO mapred.JobClient: FileSystemCounters
11/10/13 17:57:10 INFO mapred.JobClient: FILE_BYTES_READ=6
11/10/13 17:57:10 INFO mapred.JobClient: HDFS_BYTES_READ=1283358
11/10/13 17:57:10 INFO mapred.JobClient: FILE_BYTES_WRITTEN=89486
11/10/13 17:57:10 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=86
11/10/13 17:57:10 INFO mapred.JobClient: Map-Reduce Framework
11/10/13 17:57:10 INFO mapred.JobClient: Map output materialized
bytes=12
11/10/13 17:57:10 INFO mapred.JobClient: Map input records=66906
11/10/13 17:57:10 INFO mapred.JobClient: Reduce shuffle bytes=6
11/10/13 17:57:10 INFO mapred.JobClient: Spilled Records=0
11/10/13 17:57:10 INFO mapred.JobClient: Map output bytes=0
11/10/13 17:57:10 INFO mapred.JobClient: Map input bytes=1280141
11/10/13 17:57:10 INFO mapred.JobClient: Combine input records=0
11/10/13 17:57:10 INFO mapred.JobClient: SPLIT_RAW_BYTES=214
11/10/13 17:57:10 INFO mapred.JobClient: Reduce input records=0
11/10/13 17:57:10 INFO mapred.JobClient: Reduce input groups=0
11/10/13 17:57:10 INFO mapred.JobClient: Combine output records=0
11/10/13 17:57:10 INFO mapred.JobClient: Reduce output records=0
11/10/13 17:57:10 INFO mapred.JobClient: Map output records=0