Koen Smets created NUTCH-1791:
---------------------------------

             Summary: Null pointer exceptions with gora-cassandra-0.4
                 Key: NUTCH-1791
                 URL: https://issues.apache.org/jira/browse/NUTCH-1791
             Project: Nutch
          Issue Type: Bug
          Components: generator, storage
    Affects Versions: 2.3
         Environment: dsc-cassandra-2.0.2, dsc-cassandra-2.0.7
            Reporter: Koen Smets
             Fix For: 2.3


Latest nutch-2.x source checkout fails to run with Cassandra 2.0.2 (and also 
Cassandra 2.0.7) as storage backend both in normal Nutch operations (inject, 
generate, fetch) cycle as in the junit tests {{TestGoraStorage}}

{code}
2014-06-03 11:24:23,495 INFO  connection.CassandraHostRetryService 
(CassandraHostRetryService.java:<init>(48)) - Downed Host Retry service started 
with queue size -1 and retry delay 10s
2014-06-03 11:24:23,535 INFO  service.JmxMonitor 
(JmxMonitor.java:registerMonitor(52)) - Registering JMX 
me.prettyprint.cassandra.service_Test 
Cluster:ServiceType=hector,MonitorType=hector
Exception in thread "main" java.lang.NullPointerException
        at 
org.apache.gora.cassandra.query.CassandraResult.updatePersistent(CassandraResult.java:121)
        at 
org.apache.gora.cassandra.query.CassandraResult.nextInner(CassandraResult.java:57)
        at org.apache.gora.query.impl.ResultBase.next(ResultBase.java:114)
        at 
org.apache.nutch.storage.TestGoraStorage.readWrite(TestGoraStorage.java:93)
        at 
org.apache.nutch.storage.TestGoraStorage.main(TestGoraStorage.java:230)
{code}

After injecting:

{code}
ksmets@precise64 ~/l/a/r/local> ./bin/nutch inject urls
InjectorJob: starting at 2014-06-03 11:55:11
InjectorJob: Injecting urlDir: urls
InjectorJob: Using class org.apache.gora.cassandra.store.CassandraStore as the 
Gora storage class.
InjectorJob: total number of urls rejected by filters: 0
InjectorJob: total number of urls injected after normalization and filtering: 1
Injector: finished at 2014-06-03 11:55:13, elapsed: 00:00:02

ksmets@precise64 ~/l/a/r/local> ./bin/nutch readdb -stats
WebTable statistics start
Statistics for WebTable:
min score:      1.0
retry 0:        1
jobs:   {db_stats-job_local1403358409_0001={jobID=job_local1403358409_0001, 
jobName=db_stats, counters={File Input Format Counters ={BYTES_READ=0}, 
Map-Reduce Framework={MAP_OUTPUT_MATERIALIZED_BYTES=97, MAP_INPUT_RECORDS=1, 
REDUCE_SHUFFLE_BYTES=0, SPILLED_RECORDS=12, MAP_OUTPUT_BYTES=53, 
COMMITTED_HEAP_BYTES=358612992, CPU_MILLISECONDS=0, SPLIT_RAW_BYTES=769, 
COMBINE_INPUT_RECORDS=4, REDUCE_INPUT_RECORDS=6, REDUCE_INPUT_GROUPS=6, 
COMBINE_OUTPUT_RECORDS=6, PHYSICAL_MEMORY_BYTES=0, REDUCE_OUTPUT_RECORDS=6, 
VIRTUAL_MEMORY_BYTES=0, MAP_OUTPUT_RECORDS=4}, 
FileSystemCounters={FILE_BYTES_READ=974145, FILE_BYTES_WRITTEN=1144369}, File 
Output Format Counters ={BYTES_WRITTEN=225}}}}
max score:      1.0
TOTAL urls:     1
status 0 (null):        1
avg score:      1.0
WebTable statistics: done

ksmets@precise64 ~/l/a/r/local> ./bin/nutch readdb -url http://example.com/
key:    http://example.com/
baseUrl:        null
status: 0 (null)
fetchTime:      1401789311270
prevFetchTime:  0
fetchInterval:  2592000
retriesSinceFetch:      0
modifiedTime:   0
prevModifiedTime:       0
protocolStatus: (null)
parseStatus:    (null)
title:  null
score:  1.0
markers:        org.apache.gora.persistency.impl.DirtyMapWrapper@eb173c
reprUrl:        null
metadata _csh_ :        ?�
{code}

After generating,

{code}
ksmets@precise64 ~/l/a/r/local> ./bin/nutch generate -topN 1
GeneratorJob: starting at 2014-06-03 11:55:38
GeneratorJob: Selecting best-scoring urls due for fetch.
GeneratorJob: starting
GeneratorJob: filtering: true
GeneratorJob: normalizing: true
GeneratorJob: topN: 1
GeneratorJob: finished at 2014-06-03 11:55:40, time elapsed: 00:00:02
GeneratorJob: generated batch id: 1401789338-222512082 containing 1 URLs

ksmets@precise64 ~/l/a/r/local> ./bin/nutch readdb -stats
WebTable statistics start
Statistics for WebTable:
jobs:   {db_stats-job_local73029265_0001={jobID=job_local73029265_0001, 
jobName=db_stats, counters={File Input Format Counters ={BYTES_READ=0}, 
Map-Reduce Framework={MAP_OUTPUT_MATERIALIZED_BYTES=6, MAP_INPUT_RECORDS=0, 
REDUCE_SHUFFLE_BYTES=0, SPILLED_RECORDS=0, MAP_OUTPUT_BYTES=0, 
COMMITTED_HEAP_BYTES=358612992, CPU_MILLISECONDS=0, SPLIT_RAW_BYTES=769, 
COMBINE_INPUT_RECORDS=0, REDUCE_INPUT_RECORDS=0, REDUCE_INPUT_GROUPS=0, 
COMBINE_OUTPUT_RECORDS=0, PHYSICAL_MEMORY_BYTES=0, REDUCE_OUTPUT_RECORDS=0, 
VIRTUAL_MEMORY_BYTES=0, MAP_OUTPUT_RECORDS=0}, 
FileSystemCounters={FILE_BYTES_READ=974054, FILE_BYTES_WRITTEN=1144028}, File 
Output Format Counters ={BYTES_WRITTEN=98}}}}
TOTAL urls:     0
WebTable statistics: done

ksmets@precise64 ~/l/a/r/local> ./bin/nutch readdb -url http://example.com/
WebTableReader: java.lang.NullPointerException
        at 
org.apache.gora.cassandra.query.CassandraResult.updatePersistent(CassandraResult.java:121)
        at 
org.apache.gora.cassandra.query.CassandraResult.nextInner(CassandraResult.java:57)
        at org.apache.gora.query.impl.ResultBase.next(ResultBase.java:114)
        at org.apache.nutch.crawl.WebTableReader.read(WebTableReader.java:238)
        at org.apache.nutch.crawl.WebTableReader.run(WebTableReader.java:494)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.nutch.crawl.WebTableReader.main(WebTableReader.java:430)
{code}





--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to