[ 
https://issues.apache.org/jira/browse/NUTCH-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13484148#comment-13484148
 ] 

Julien Nioche commented on NUTCH-1477:
--------------------------------------

I found in 
http://mail-archives.apache.org/mod_mbox/avro-user/200910.mbox/%3c4ae78503.50...@apache.org%3E
 that we probably need to explicitly allow for null values in the schema (see 
attachment). 

I tried recompiling the schemas with {{ant compile-avro-schema}} but the 
classes generated do not compile and are nowhere near as complete as the 
original ones. More worryingly the same is true with the original schema. I 
assumed that the code in org.apache.nutch.storage could be generated from the 
schemas.

Any idea?
                
> NPE when injecting with DataFileAvroStore
> -----------------------------------------
>
>                 Key: NUTCH-1477
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1477
>             Project: Nutch
>          Issue Type: Bug
>          Components: storage
>    Affects Versions: 2.1
>         Environment: Java 1.6.0_35
>            Reporter: Mike Baranczak
>            Assignee: Julien Nioche
>             Fix For: 2.2
>
>         Attachments: webpage.avsc
>
>
> Fresh installation of Nutch 2.1, configured to use DataFileAvroStore. 
> Injection job throws NullPointerException, see below. No error when I switch 
> to MemStore.
> java.lang.NullPointerException
>       at org.apache.avro.io.BinaryEncoder.writeString(BinaryEncoder.java:133)
>       at 
> org.apache.avro.generic.GenericDatumWriter.writeString(GenericDatumWriter.java:176)
>       at 
> org.apache.avro.generic.GenericDatumWriter.writeString(GenericDatumWriter.java:171)
>       at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:72)
>       at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:89)
>       at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62)
>       at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:55)
>       at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:245)
>       at 
> org.apache.gora.avro.store.DataFileAvroStore.put(DataFileAvroStore.java:54)
>       at 
> org.apache.gora.mapreduce.GoraRecordWriter.write(GoraRecordWriter.java:60)
>       at 
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
>       at 
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>       at 
> org.apache.nutch.crawl.InjectorJob$UrlMapper.map(InjectorJob.java:185)
>       at org.apache.nutch.crawl.InjectorJob$UrlMapper.map(InjectorJob.java:85)
>       at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>       at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to