David Mollitor created AVRO-3183:
------------------------------------
Summary: Do Not Double Buffer Data in DataFileWriter
Key: AVRO-3183
URL: https://issues.apache.org/jira/browse/AVRO-3183
Project: Apache Avro
Issue Type: Improvement
Components: java
Affects Versions: 1.10.0
Reporter: David Mollitor
Assignee: David Mollitor
{code:java|title=DataFileWriter.java}
private void init(OutputStream outs) throws IOException {
this.underlyingStream = outs;
this.out = new BufferedFileOutputStream(outs);
EncoderFactory efactory = new EncoderFactory();
// binaryEncoder returns a buffered Encoder and is wrapping a
BufferedFileOutputStream
this.vout = efactory.binaryEncoder(out, null);
dout.setSchema(schema);
buffer = new NonCopyingByteArrayOutputStream(Math.min((int) (syncInterval *
1.25), Integer.MAX_VALUE / 2 - 1));
// binaryEncoder returns a buffered Encoder and is wrapping a
NonCopyingByteArrayOutputStream
this.bufOut = efactory.binaryEncoder(buffer, null);
if (this.codec == null) {
this.codec = CodecFactory.nullCodec().createInstance();
}
this.isOpen = true;
}
{code}
The {{FileWriter}} is double-buffering the output which just adds redundant
overhead and truthfully the buffering offered by the object returned by
{{binaryEncoder}} is a bit simplistic and does not do as good of a job as the
buffering in {{BufferedFileOutputStream}}.
Remove this double buffering by using a 'direct' {{binaryEncoder}}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)