Hi all, I tried to follow the suggestions and also looked at the code how the Avro thing works in mappers and reducers and created a simple class for Avro multiple outputs. If you are interested in looking or reviewing you can follow the link: http://pastebin.com/HMPfgttg
Any suggestions and comments are highly appreciated Vyacheslav On Jul 30, 2011, at 7:26 PM, Jason wrote: > You can extend/customize MultipleOutputs and pass schema related settings via > properties prefixed with MO name, just like it is done with format classes > there. > > Also to send a dummy key or value why not just to use NullWritable? It's > efficient as it does not consume any space. > > Sent from my iPhone > > On Jul 26, 2011, at 5:46 AM, Vyacheslav Zholudev > <vyacheslav.zholu...@gmail.com> wrote: > >> Hi, >> >> I'm using the avro format both for input and output, for a mapper and a >> reducer. I would like to output multiple avro items with different schemata. >> For sequence files I would use the MultipleOutputs class from the mapreduce >> package. >> >> I looked into the same class but from the old package "mapred" and realized >> that I can pass an AvroOutputFormat.class parameter when adding another >> output. However, I didn't manage to figure out how to provide an avro schema >> for each output. Moreover, when writing to output , I need to provide a key >> and a value, but in case of avro we usually just pass a specific avro >> object. All above makes me think that the old MultipleOutputs API wouldn't >> work with avro files. Am I right? >> >> Any pointers of how to output multiple avro records in the same reducer are >> appreciated. >> >> P.S. Another thought was to create an avro schema of type union that will >> contain all possible output schemata, but I would like to avoid that. >> >> Thanks in advance!!! >> >> -- >> Best, >> Vyacheslav