Hi all,

I tried to follow the suggestions and also looked at the code how the Avro 
thing works in mappers and reducers and created a simple class for Avro 
multiple outputs. If you are interested in looking or reviewing you can follow 
the link:
http://pastebin.com/HMPfgttg

Any suggestions and comments are highly appreciated

Vyacheslav

On Jul 30, 2011, at 7:26 PM, Jason wrote:

> You can extend/customize MultipleOutputs and pass schema related settings via 
> properties prefixed with MO name, just like it is done with format classes 
> there.
> 
> Also to send a dummy key or value why not just to use NullWritable? It's 
> efficient as it does not consume any space.
> 
> Sent from my iPhone
> 
> On Jul 26, 2011, at 5:46 AM, Vyacheslav Zholudev 
> <vyacheslav.zholu...@gmail.com> wrote:
> 
>> Hi,
>> 
>> I'm using the avro format both for input and output, for a mapper and a 
>> reducer. I would like to output multiple avro items with different schemata. 
>> For sequence files I would use the MultipleOutputs class from the mapreduce 
>> package.
>> 
>> I looked into the same class but from the old package "mapred" and realized 
>> that I can pass an AvroOutputFormat.class parameter when adding another 
>> output. However, I didn't manage to figure out how to provide an avro schema 
>> for each output. Moreover, when writing to output , I need to provide a key 
>> and a value, but in case of avro we usually just pass a specific avro 
>> object. All above makes me think that the old MultipleOutputs API wouldn't 
>> work with avro files. Am I right?
>> 
>> Any pointers of how to output multiple avro records in the same reducer are 
>> appreciated. 
>> 
>> P.S. Another thought was to create an avro schema of type union that will 
>> contain all possible output schemata, but I would like to avoid that.
>> 
>> Thanks in advance!!!
>> 
>> -- 
>> Best,
>> Vyacheslav

Reply via email to