Make possible usage of SpecificDatumWriter in avro-mapred
---------------------------------------------------------
Key: AVRO-969
URL: https://issues.apache.org/jira/browse/AVRO-969
Project: Avro
Issue Type: Improvement
Components: java
Affects Versions: 1.6.1
Reporter: Vyacheslav Zholudev
I realized that ReflectDatumWriter is always used when running mapred job (in
AvroOutputFormat.java). Sometimes it leads to bugs like in AVRO-966.
Why not just provide a property like {{WRITER_IS_REFLECT =
"avro.map.writer.is.reflect";}} to make a decision which DatumWriter should be
used.
I created a small patch to solve this:
{code:title=avro-mapred.patch|borderStyle=solid}
Index: lang/java/mapred/src/main/java/org/apache/avro/mapred/AvroJob.java
===================================================================
--- lang/java/mapred/src/main/java/org/apache/avro/mapred/AvroJob.java
(revision 1209417)
+++ lang/java/mapred/src/main/java/org/apache/avro/mapred/AvroJob.java
(revision )
@@ -53,6 +53,8 @@
/** The configuration key for reflection-based map output representation. */
public static final String MAP_OUTPUT_IS_REFLECT =
"avro.map.output.is.reflect";
+ public static final String WRITER_IS_REFLECT = "avro.map.writer.is.reflect";
+
/** Configure a job's map input schema. */
public static void setInputSchema(JobConf job, Schema s) {
job.set(INPUT_SCHEMA, s.toString());
Index:
lang/java/mapred/src/main/java/org/apache/avro/mapred/AvroOutputFormat.java
===================================================================
--- lang/java/mapred/src/main/java/org/apache/avro/mapred/AvroOutputFormat.java
(revision 1209417)
+++ lang/java/mapred/src/main/java/org/apache/avro/mapred/AvroOutputFormat.java
(revision )
@@ -23,6 +23,7 @@
import java.util.Map;
import java.net.URLDecoder;
+import org.apache.avro.specific.SpecificDatumWriter;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
@@ -102,8 +103,9 @@
? AvroJob.getMapOutputSchema(job)
: AvroJob.getOutputSchema(job);
- final DataFileWriter<T> writer =
- new DataFileWriter<T>(new ReflectDatumWriter<T>());
+ final DataFileWriter<T> writer = job.getBoolean(AvroJob.WRITER_IS_REFLECT,
false) ?
+ new DataFileWriter<T>(new ReflectDatumWriter<T>()) :
+ new DataFileWriter<T>(new SpecificDatumWriter<T>());
configureDataFileWriter(writer, job);
{code}
Does it make sense?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira