[
https://issues.apache.org/jira/browse/NUTCH-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13526501#comment-13526501
]
Lewis John McGibbney commented on NUTCH-1477:
---------------------------------------------
The most recent avsc, with the most recent patch in GORA-174 compiles the Java
classes properly with the correct getters and setters for WebPage, ParseStatus
and ProtocolStatus. I am happy with the part.
The next problem is that now when I attempt to inject an url list into
DataFileAvroStore, I get the following
{code}
java.lang.NullPointerException
at
org.apache.avro.specific.SpecificDatumWriter.getField(SpecificDatumWriter.java:48)
at
org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:89)
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62)
at
org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:89)
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62)
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:55)
at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:245)
at
org.apache.gora.avro.store.DataFileAvroStore.put(DataFileAvroStore.java:54)
at
org.apache.gora.mapreduce.GoraRecordWriter.write(GoraRecordWriter.java:60)
at
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
at
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at
org.apache.nutch.crawl.InjectorJob$UrlMapper.map(InjectorJob.java:185)
at org.apache.nutch.crawl.InjectorJob$UrlMapper.map(InjectorJob.java:85)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
{code}
I think I might take this one over to user@avro as I don't know enough about
the 1.3.3 codebase and I have no immediate thoughts on this one other than the
few optimistic attempts I tried to get things working. Even if someone could
confirm that the above stack is spewed when injecting would allow us to confirm
that this behavior is consistent.
> NPE when injecting with DataFileAvroStore
> -----------------------------------------
>
> Key: NUTCH-1477
> URL: https://issues.apache.org/jira/browse/NUTCH-1477
> Project: Nutch
> Issue Type: Bug
> Components: storage
> Affects Versions: 2.1
> Environment: Java 1.6.0_35
> Reporter: Mike Baranczak
> Assignee: Julien Nioche
> Priority: Critical
> Fix For: 2.2
>
> Attachments: webpage.avsc, webpage.avsc, webpage.avsc
>
>
> Fresh installation of Nutch 2.1, configured to use DataFileAvroStore.
> Injection job throws NullPointerException, see below. No error when I switch
> to MemStore.
> java.lang.NullPointerException
> at org.apache.avro.io.BinaryEncoder.writeString(BinaryEncoder.java:133)
> at
> org.apache.avro.generic.GenericDatumWriter.writeString(GenericDatumWriter.java:176)
> at
> org.apache.avro.generic.GenericDatumWriter.writeString(GenericDatumWriter.java:171)
> at
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:72)
> at
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:89)
> at
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62)
> at
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:55)
> at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:245)
> at
> org.apache.gora.avro.store.DataFileAvroStore.put(DataFileAvroStore.java:54)
> at
> org.apache.gora.mapreduce.GoraRecordWriter.write(GoraRecordWriter.java:60)
> at
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
> at
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> at
> org.apache.nutch.crawl.InjectorJob$UrlMapper.map(InjectorJob.java:185)
> at org.apache.nutch.crawl.InjectorJob$UrlMapper.map(InjectorJob.java:85)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira