[ 
https://issues.apache.org/jira/browse/AVRO-969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163776#comment-13163776
 ] 

Doug Cutting commented on AVRO-969:
-----------------------------------

AVRO-966 is a bug.  I supplied a patch there.

SpecificDatumWriter might be a bit more efficient.  Reflection is not in 
general used for specific objects though, even when ReflectDatumReader is used. 
 Rather ReflectDatumReader detects the specific object and uses 
SpecificDatumReader, but that detection adds a small cost.
                
> Make possible usage of SpecificDatumWriter in avro-mapred
> ---------------------------------------------------------
>
>                 Key: AVRO-969
>                 URL: https://issues.apache.org/jira/browse/AVRO-969
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.6.1
>            Reporter: Vyacheslav Zholudev
>
> I realized that ReflectDatumWriter is always used when running mapred job (in 
> AvroOutputFormat.java). Sometimes it leads to bugs like in AVRO-966.
> Why not just provide a property like {{WRITER_IS_REFLECT = 
> "avro.map.writer.is.reflect";}} to make a decision which DatumWriter should 
> be used. 
> I created a small patch to solve this:
> {code:title=avro-mapred.patch|borderStyle=solid}
> Index: lang/java/mapred/src/main/java/org/apache/avro/mapred/AvroJob.java
> ===================================================================
> --- lang/java/mapred/src/main/java/org/apache/avro/mapred/AvroJob.java  
> (revision 1209417)
> +++ lang/java/mapred/src/main/java/org/apache/avro/mapred/AvroJob.java  
> (revision )
> @@ -53,6 +53,8 @@
>    /** The configuration key for reflection-based map output representation. 
> */
>    public static final String MAP_OUTPUT_IS_REFLECT = 
> "avro.map.output.is.reflect";
>  
> +  public static final String WRITER_IS_REFLECT = 
> "avro.map.writer.is.reflect";
> +
>    /** Configure a job's map input schema. */
>    public static void setInputSchema(JobConf job, Schema s) {
>      job.set(INPUT_SCHEMA, s.toString());
> Index: 
> lang/java/mapred/src/main/java/org/apache/avro/mapred/AvroOutputFormat.java
> ===================================================================
> --- 
> lang/java/mapred/src/main/java/org/apache/avro/mapred/AvroOutputFormat.java 
> (revision 1209417)
> +++ 
> lang/java/mapred/src/main/java/org/apache/avro/mapred/AvroOutputFormat.java 
> (revision )
> @@ -23,6 +23,7 @@
>  import java.util.Map;
>  import java.net.URLDecoder;
>  
> +import org.apache.avro.specific.SpecificDatumWriter;
>  import org.apache.hadoop.io.NullWritable;
>  import org.apache.hadoop.fs.FileSystem;
>  import org.apache.hadoop.fs.Path;
> @@ -102,8 +103,9 @@
>        ? AvroJob.getMapOutputSchema(job)
>        : AvroJob.getOutputSchema(job);
>  
> -    final DataFileWriter<T> writer =
> -      new DataFileWriter<T>(new ReflectDatumWriter<T>());
> +    final DataFileWriter<T> writer = 
> job.getBoolean(AvroJob.WRITER_IS_REFLECT, false) ?
> +      new DataFileWriter<T>(new ReflectDatumWriter<T>()) :
> +      new DataFileWriter<T>(new SpecificDatumWriter<T>());
>      
>      configureDataFileWriter(writer, job);
> {code}
> Does it make sense? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to