David Mollitor created AVRO-4240:
------------------------------------

             Summary: Size DataFileWriter output buffer to fit entire block 
frame to reduce write syscalls
                 Key: AVRO-4240
                 URL: https://issues.apache.org/jira/browse/AVRO-4240
             Project: Apache Avro
          Issue Type: Improvement
          Components: java
    Affects Versions: 1.12.1, 1.11.5, 1.10.2
            Reporter: David Mollitor
            Assignee: David Mollitor
             Fix For: 1.13.0


DataFileStream.DataBlock#writeBlockTo writes four pieces sequentially through a 
DirectBinaryEncoder into a BufferedFileOutputStream that uses the default 8KB 
BufferedOutputStream buffer:
 # Entry count (varint-encoded long, 1-10 bytes)
 # Block size (varint-encoded long, 1-10 bytes)
 # Compressed block data (~64KB at the default sync interval)
 # Sync marker (16 bytes)

The default sync interval was increased from 16KB to 64KB in AVRO-1398 but the 
BufferedFileOutputStream buffer size was never adjusted. Since the block data 
far exceeds the 8KB buffer, BufferedOutputStream flushes the buffered entry 
count and block size bytes, then writes the block data directly, then the sync 
marker goes into the buffer and gets flushed again at the end, resulting in at 
least 3 write syscalls per block instead of 1.

This change sizes the BufferedFileOutputStream buffer to maxBlockSize() + 20 + 
sync.length so that a complete block frame fits in a single buffer, accumulates 
all writes, and flushes once.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to