[
https://issues.apache.org/jira/browse/AVRO-3183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ryan Skraba resolved AVRO-3183.
-------------------------------
Resolution: Fixed
Thanks for the fix -- this looks like the right thing to do!
For reference, did you happen to measure any performance impact as a result of
this change?
> Do Not Double Buffer Data in DataFileWriter
> -------------------------------------------
>
> Key: AVRO-3183
> URL: https://issues.apache.org/jira/browse/AVRO-3183
> Project: Apache Avro
> Issue Type: Improvement
> Components: java
> Affects Versions: 1.10.0
> Reporter: David Mollitor
> Assignee: David Mollitor
> Priority: Minor
> Fix For: 1.11.0
>
>
> {code:java|title=DataFileWriter.java}
> private void init(OutputStream outs) throws IOException {
> this.underlyingStream = outs;
> this.out = new BufferedFileOutputStream(outs);
> EncoderFactory efactory = new EncoderFactory();
> // binaryEncoder returns a buffered Encoder and is wrapping a
> BufferedFileOutputStream
> this.vout = efactory.binaryEncoder(out, null);
> dout.setSchema(schema);
> buffer = new NonCopyingByteArrayOutputStream(Math.min((int) (syncInterval
> * 1.25), Integer.MAX_VALUE / 2 - 1));
> // binaryEncoder returns a buffered Encoder and is wrapping a
> NonCopyingByteArrayOutputStream
> this.bufOut = efactory.binaryEncoder(buffer, null);
> if (this.codec == null) {
> this.codec = CodecFactory.nullCodec().createInstance();
> }
> this.isOpen = true;
> }
> {code}
> The {{FileWriter}} is double-buffering the output which just adds redundant
> overhead and truthfully the buffering offered by the object returned by
> {{binaryEncoder}} is a bit simplistic and does not do as good of a job as the
> buffering in {{BufferedFileOutputStream}}.
> Remove this double buffering by using a 'direct' {{binaryEncoder}}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)