[ 
https://issues.apache.org/jira/browse/NUTCH-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-1791:
----------------------------------------
    Fix Version/s:     (was: 2.3)
                   2.4

> Null pointer exceptions with gora-cassandra-0.4
> -----------------------------------------------
>
>                 Key: NUTCH-1791
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1791
>             Project: Nutch
>          Issue Type: Bug
>          Components: generator, storage
>    Affects Versions: 2.3
>         Environment: dsc-cassandra-2.0.2, dsc-cassandra-2.0.7
>            Reporter: Koen Smets
>             Fix For: 2.4
>
>
> Latest nutch-2.x source checkout fails to run with Cassandra 2.0.2 (and also 
> Cassandra 2.0.7) as storage backend both in normal Nutch operations (inject, 
> generate, fetch) cycle as in the junit tests {{TestGoraStorage}}
> {code}
> 2014-06-03 11:24:23,495 INFO  connection.CassandraHostRetryService 
> (CassandraHostRetryService.java:<init>(48)) - Downed Host Retry service 
> started with queue size -1 and retry delay 10s
> 2014-06-03 11:24:23,535 INFO  service.JmxMonitor 
> (JmxMonitor.java:registerMonitor(52)) - Registering JMX 
> me.prettyprint.cassandra.service_Test 
> Cluster:ServiceType=hector,MonitorType=hector
> Exception in thread "main" java.lang.NullPointerException
>       at 
> org.apache.gora.cassandra.query.CassandraResult.updatePersistent(CassandraResult.java:121)
>       at 
> org.apache.gora.cassandra.query.CassandraResult.nextInner(CassandraResult.java:57)
>       at org.apache.gora.query.impl.ResultBase.next(ResultBase.java:114)
>       at 
> org.apache.nutch.storage.TestGoraStorage.readWrite(TestGoraStorage.java:93)
>       at 
> org.apache.nutch.storage.TestGoraStorage.main(TestGoraStorage.java:230)
> {code}
> After injecting:
> {code}
> ksmets@precise64 ~/l/a/r/local> ./bin/nutch inject urls
> InjectorJob: starting at 2014-06-03 11:55:11
> InjectorJob: Injecting urlDir: urls
> InjectorJob: Using class org.apache.gora.cassandra.store.CassandraStore as 
> the Gora storage class.
> InjectorJob: total number of urls rejected by filters: 0
> InjectorJob: total number of urls injected after normalization and filtering: 
> 1
> Injector: finished at 2014-06-03 11:55:13, elapsed: 00:00:02
> ksmets@precise64 ~/l/a/r/local> ./bin/nutch readdb -stats
> WebTable statistics start
> Statistics for WebTable:
> min score:    1.0
> retry 0:      1
> jobs: {db_stats-job_local1403358409_0001={jobID=job_local1403358409_0001, 
> jobName=db_stats, counters={File Input Format Counters ={BYTES_READ=0}, 
> Map-Reduce Framework={MAP_OUTPUT_MATERIALIZED_BYTES=97, MAP_INPUT_RECORDS=1, 
> REDUCE_SHUFFLE_BYTES=0, SPILLED_RECORDS=12, MAP_OUTPUT_BYTES=53, 
> COMMITTED_HEAP_BYTES=358612992, CPU_MILLISECONDS=0, SPLIT_RAW_BYTES=769, 
> COMBINE_INPUT_RECORDS=4, REDUCE_INPUT_RECORDS=6, REDUCE_INPUT_GROUPS=6, 
> COMBINE_OUTPUT_RECORDS=6, PHYSICAL_MEMORY_BYTES=0, REDUCE_OUTPUT_RECORDS=6, 
> VIRTUAL_MEMORY_BYTES=0, MAP_OUTPUT_RECORDS=4}, 
> FileSystemCounters={FILE_BYTES_READ=974145, FILE_BYTES_WRITTEN=1144369}, File 
> Output Format Counters ={BYTES_WRITTEN=225}}}}
> max score:    1.0
> TOTAL urls:   1
> status 0 (null):      1
> avg score:    1.0
> WebTable statistics: done
> ksmets@precise64 ~/l/a/r/local> ./bin/nutch readdb -url http://example.com/
> key:  http://example.com/
> baseUrl:      null
> status:       0 (null)
> fetchTime:    1401789311270
> prevFetchTime:        0
> fetchInterval:        2592000
> retriesSinceFetch:    0
> modifiedTime: 0
> prevModifiedTime:     0
> protocolStatus:       (null)
> parseStatus:  (null)
> title:        null
> score:        1.0
> markers:      org.apache.gora.persistency.impl.DirtyMapWrapper@eb173c
> reprUrl:      null
> metadata _csh_ :      ?�
> {code}
> After generating,
> {code}
> ksmets@precise64 ~/l/a/r/local> ./bin/nutch generate -topN 1
> GeneratorJob: starting at 2014-06-03 11:55:38
> GeneratorJob: Selecting best-scoring urls due for fetch.
> GeneratorJob: starting
> GeneratorJob: filtering: true
> GeneratorJob: normalizing: true
> GeneratorJob: topN: 1
> GeneratorJob: finished at 2014-06-03 11:55:40, time elapsed: 00:00:02
> GeneratorJob: generated batch id: 1401789338-222512082 containing 1 URLs
> ksmets@precise64 ~/l/a/r/local> ./bin/nutch readdb -stats
> WebTable statistics start
> Statistics for WebTable:
> jobs: {db_stats-job_local73029265_0001={jobID=job_local73029265_0001, 
> jobName=db_stats, counters={File Input Format Counters ={BYTES_READ=0}, 
> Map-Reduce Framework={MAP_OUTPUT_MATERIALIZED_BYTES=6, MAP_INPUT_RECORDS=0, 
> REDUCE_SHUFFLE_BYTES=0, SPILLED_RECORDS=0, MAP_OUTPUT_BYTES=0, 
> COMMITTED_HEAP_BYTES=358612992, CPU_MILLISECONDS=0, SPLIT_RAW_BYTES=769, 
> COMBINE_INPUT_RECORDS=0, REDUCE_INPUT_RECORDS=0, REDUCE_INPUT_GROUPS=0, 
> COMBINE_OUTPUT_RECORDS=0, PHYSICAL_MEMORY_BYTES=0, REDUCE_OUTPUT_RECORDS=0, 
> VIRTUAL_MEMORY_BYTES=0, MAP_OUTPUT_RECORDS=0}, 
> FileSystemCounters={FILE_BYTES_READ=974054, FILE_BYTES_WRITTEN=1144028}, File 
> Output Format Counters ={BYTES_WRITTEN=98}}}}
> TOTAL urls:   0
> WebTable statistics: done
> ksmets@precise64 ~/l/a/r/local> ./bin/nutch readdb -url http://example.com/
> WebTableReader: java.lang.NullPointerException
>       at 
> org.apache.gora.cassandra.query.CassandraResult.updatePersistent(CassandraResult.java:121)
>       at 
> org.apache.gora.cassandra.query.CassandraResult.nextInner(CassandraResult.java:57)
>       at org.apache.gora.query.impl.ResultBase.next(ResultBase.java:114)
>       at org.apache.nutch.crawl.WebTableReader.read(WebTableReader.java:238)
>       at org.apache.nutch.crawl.WebTableReader.run(WebTableReader.java:494)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>       at org.apache.nutch.crawl.WebTableReader.main(WebTableReader.java:430)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to