Hello Folks,

We are trying to write an Avro based HDFS Consumer which is capable of reading 
offsets from zookeeper. Currently we do have a producer which writes avro 
messages. We are trying to utilize the avro map-reduce API and running into 
challenges. Additionally there is a big schema management piece here as you can 
imagine. Ideally there should a schema service which could be utilized by the 
producer and consumer.

 Any thoughts or ideas on the avro based consumer will be greatly appreciated. 
If there is sample avro based consumer code or anything else we could look at 
as well to debug the issue. Finally there were talks around LinkedIn open 
sourcing the avro schema management piece has this already happened or in 
progress?

Exception Stacktrace when we run the Map Only Job (Just for context)

Additional Context:
org.apache.avro.file.DataFileWriter$AppendWriteException: 
java.lang.NullPointerException: in com.rr.avro.ViewEvent in union null of union 
in field viewGuid of com.rr.avro.ViewEvent
    at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:261)
    at 
org.apache.avro.mapred.AvroOutputFormat$1.write(AvroOutputFormat.java:122)
    at 
org.apache.avro.mapred.AvroOutputFormat$1.write(AvroOutputFormat.java:119)
    at 
org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.collect(MapTask.java:706)
    at 
org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:499)
    at 
com.rr.hadoop.consumer.HadoopConsumer$KafkaAvroMapper.map(HadoopConsumer.java:92)
    at 
com.rr.hadoop.consumer.HadoopConsumer$KafkaAvroMapper.map(HadoopConsumer.java:68)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
    at java.security.AccessC
attempt_201207041152_56118_m_000000_0: KafkaContext

Thanks,
murtaza

Reply via email to