iemejia opened a new pull request, #3565:
URL: https://github.com/apache/parquet-java/pull/3565

   Part of #3530 — Apache Parquet Java Performance Improvements
   
   ## Summary
   
   Replace `ByteBufferInputStream` and `LittleEndianDataInputStream` wrappers 
with direct `ByteBuffer` access for all PLAIN value readers and writers.
   
   **Readers** (`PlainValuesReader`, `BooleanPlainValuesReader`, 
`BinaryPlainValuesReader`, `FixedLenByteArrayPlainValuesReader`): hold a 
little-endian `ByteBuffer` from `initFromPage()` and call 
`getInt`/`getLong`/`getFloat`/`getDouble` directly, eliminating per-value 
stream overhead.
   
   **Writers** (`PlainValuesWriter`, `BooleanPlainValuesWriter`, 
`FixedLenByteArrayPlainValuesWriter`): write through 
`CapacityByteArrayOutputStream`'s new `writeInt`/`writeLong` methods which put 
values directly into the NIO slab buffer in little-endian order, avoiding 
temporary byte-array allocation.
   
   **Supporting changes**:
   - `CapacityByteArrayOutputStream`: allocate slabs with 
`ByteOrder.LITTLE_ENDIAN`, add `writeInt(int)` and `writeLong(long)` for 
single-value NIO writes.
   - `BytesInput`: add zero-copy `writeTo(ByteBuffer)` and `toByteArray()` 
using bulk `ByteBuffer.get()` instead of stream copy.
   - `LittleEndianDataOutputStream`: batch single-byte writes into single 
`write(buf, 0, N)` calls for `writeShort`/`writeInt`.
   
   Includes JMH benchmarks (`PlainEncodingBenchmark`, `PlainDecodingBenchmark`) 
covering all 7 primitive types for both encoding and decoding.
   
   ## Benchmark results
   
   **Environment**: JDK 25.0.3 (Temurin), OpenJDK 64-Bit Server VM, JMH 1.37, 
Linux x86_64.
   
   Decoding (100K values/iteration, 3 forks x 5 iterations, throughput mode):
   
   | Benchmark | Master (M ops/s) | Branch (M ops/s) | Speedup |
   |---|---:|---:|---:|
   | decodeInt | 425 | 5,427 | **12.8x** |
   | decodeFloat | 416 | 5,440 | **13.1x** |
   | decodeLong | 119 | 4,720 | **39.5x** (\*) |
   | decodeDouble | 116 | 6,026 | **51.8x** (\*) |
   | decodeBoolean | 639 | 1,642 | **2.6x** |
   | decodeFlba (len=2,12,16) | 188 | 680 | **3.6x** |
   | decodeBinary (len=10,100,1000) | 142 | 225-230 | **1.6x** |
   
   Encoding:
   
   | Benchmark | Master (M ops/s) | Branch (M ops/s) | Speedup |
   |---|---:|---:|---:|
   | encodeInt | 148 | 559 | **3.8x** |
   | encodeFloat | 150 | 532 | **3.5x** |
   | encodeLong | 193 | 478 | **2.5x** |
   | encodeDouble | 179 | 439 | **2.4x** |
   | encodeBoolean | 850 | 1,692 | **2.0x** |
   | encodeBinary (len=10) | 76 | 150 | **2.0x** |
   | encodeFlba (len=2-16) | 156-184 | 178-224 | **1.1-1.2x** |
   
   (\*) decodeLong/Double show JIT variance across forks (error bars >20%); 
true steady-state likely ~13x consistent with INT32/FLOAT.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to