Re: [I] parquet: ByteArrayEncoder allocates large unused FallbackEncoder for Parquet 2 [arrow-rs]

via GitHub Thu, 16 May 2024 08:55:29 -0700


AdamGS commented on issue #5755:
URL: https://github.com/apache/arrow-rs/issues/5755#issuecomment-2115613955


   changed `MAX_BIT_WRITER_SIZE` to 1MB, and benchmarks on my M1 MBP seem 
mostly ok - some faster, some slower and no significant swing in any of them.
   
   @tustvold do you think that's a valuable contribution (+ a name change for 
the const)? 
   
   ```
   Benchmarking write_batch primitive/4096 values primitive: Warming up for 
3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 5.2s, enable flat sampling, or reduce sample count to 60.
   write_batch primitive/4096 values primitive
                           time:   [1.0027 ms 1.0207 ms 1.0453 ms]
                           thrpt:  [168.32 MiB/s 172.36 MiB/s 175.46 MiB/s]
   Found 4 outliers among 100 measurements (4.00%)
     2 (2.00%) high mild
     2 (2.00%) high severe
   write_batch primitive/4096 values primitive with bloom filter
                           time:   [6.3078 ms 6.6004 ms 6.9748 ms]
                           thrpt:  [25.224 MiB/s 26.655 MiB/s 27.891 MiB/s]
   Found 14 outliers among 100 measurements (14.00%)
     6 (6.00%) high mild
     8 (8.00%) high severe
   write_batch primitive/4096 values primitive non-null
                           time:   [848.44 µs 862.57 µs 878.64 µs]
                           thrpt:  [196.35 MiB/s 200.00 MiB/s 203.33 MiB/s]
   Found 8 outliers among 100 measurements (8.00%)
     6 (6.00%) high mild
     2 (2.00%) high severe
   write_batch primitive/4096 values primitive non-null with bloom filter
                           time:   [5.7671 ms 5.9049 ms 6.0665 ms]
                           thrpt:  [28.437 MiB/s 29.216 MiB/s 29.914 MiB/s]
   Found 12 outliers among 100 measurements (12.00%)
     3 (3.00%) high mild
     9 (9.00%) high severe
   write_batch primitive/4096 values bool
                           time:   [134.33 µs 142.90 µs 155.07 µs]
                           thrpt:  [6.8386 MiB/s 7.4212 MiB/s 7.8943 MiB/s]
   Found 7 outliers among 100 measurements (7.00%)
     2 (2.00%) high mild
     5 (5.00%) high severe
   write_batch primitive/4096 values bool non-null
                           time:   [100.22 µs 104.75 µs 110.49 µs]
                           thrpt:  [5.1786 MiB/s 5.4627 MiB/s 5.7098 MiB/s]
   Found 10 outliers among 100 measurements (10.00%)
     4 (4.00%) high mild
     6 (6.00%) high severe
   write_batch primitive/4096 values string
                           time:   [472.16 µs 479.04 µs 487.28 µs]
                           thrpt:  [163.05 MiB/s 165.86 MiB/s 168.27 MiB/s]
   Found 8 outliers among 100 measurements (8.00%)
     5 (5.00%) high mild
     3 (3.00%) high severe
   Benchmarking write_batch primitive/4096 values string with bloom filter: 
Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 9.7s, enable flat sampling, or reduce sample count to 50.
   write_batch primitive/4096 values string with bloom filter
                           time:   [1.9046 ms 1.9504 ms 2.0007 ms]
                           thrpt:  [39.713 MiB/s 40.736 MiB/s 41.715 MiB/s]
   Found 7 outliers among 100 measurements (7.00%)
     4 (4.00%) high mild
     3 (3.00%) high severe
   write_batch primitive/4096 values string dictionary
                           time:   [284.22 µs 291.52 µs 299.59 µs]
                           thrpt:  [159.75 MiB/s 164.17 MiB/s 168.39 MiB/s]
   Found 6 outliers among 100 measurements (6.00%)
     5 (5.00%) high mild
     1 (1.00%) high severe
   (⎈ |arn:aws:eks:us-east-2:115740606080:cluster/cluster-dev-aws-5:default)➜  
parquet git:(master) ✗ cargo bench
      Compiling parquet v50.0.0 (/Users/adamgs/Code/arrow-rs/parquet)
   warning: unused import: 
`fixed_len_byte_array::make_fixed_len_byte_array_reader`
     --> parquet/src/arrow/array_reader/mod.rs:50:9
      |
   50 | pub use fixed_len_byte_array::make_fixed_len_byte_array_reader;
      |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      |
      = note: `#[warn(unused_imports)]` on by default
   
   warning: unused import: `lz4_codec::*`
      --> parquet/src/compression.rs:445:9
       |
   445 | pub use lz4_codec::*;
       |         ^^^^^^^^^^^^
   
   warning: methods `as_any` and `next_batch` are never used
     --> parquet/src/arrow/array_reader/mod.rs:60:8
      |
   59 | pub trait ArrayReader: Send {
      |           ----------- methods in this trait
   60 |     fn as_any(&self) -> &dyn Any;
      |        ^^^^^^
   ...
   66 |     fn next_batch(&mut self, batch_size: usize) -> Result<ArrayRef> {
      |        ^^^^^^^^^^
      |
      = note: `#[warn(dead_code)]` on by default
   
   warning: trait `EncodingWriteSupport` is never used
       --> parquet/src/column/writer/mod.rs:1177:7
        |
   1177 | trait EncodingWriteSupport {
        |       ^^^^^^^^^^^^^^^^^^^^
   
   warning: method `put_spaced` is never used
     --> parquet/src/encodings/encoding/mod.rs:50:8
      |
   42 | pub trait Encoder<T: DataType>: Send {
      |           ------- method in this trait
   ...
   50 |     fn put_spaced(&mut self, values: &[T::T], valid_bits: &[u8]) -> 
Result<usize> {
      |        ^^^^^^^^^^
   
   warning: `parquet` (lib) generated 5 warnings (run `cargo fix --lib -p 
parquet` to apply 2 suggestions)
       Finished `bench` profile [optimized] target(s) in 12.46s
        Running benches/arrow_writer.rs 
(/Users/adamgs/Code/arrow-rs/target/release/deps/arrow_writer-662ce5b834f3cbe1)
   Benchmarking write_batch primitive/4096 values primitive: Warming up for 
3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 5.2s, enable flat sampling, or reduce sample count to 60.
   write_batch primitive/4096 values primitive
                           time:   [1.0128 ms 1.0361 ms 1.0653 ms]
                           thrpt:  [165.16 MiB/s 169.80 MiB/s 173.71 MiB/s]
                    change:
                           time:   [-0.5032% +2.4433% +5.8674%] (p = 0.13 > 
0.05)
                           thrpt:  [-5.5422% -2.3850% +0.5058%]
                           No change in performance detected.
   Found 7 outliers among 100 measurements (7.00%)
     2 (2.00%) high mild
     5 (5.00%) high severe
   write_batch primitive/4096 values primitive with bloom filter
                           time:   [5.9182 ms 6.0575 ms 6.2265 ms]
                           thrpt:  [28.255 MiB/s 29.044 MiB/s 29.728 MiB/s]
                    change:
                           time:   [-13.706% -8.2254% -3.3002%] (p = 0.00 < 
0.05)
                           thrpt:  [+3.4128% +8.9626% +15.883%]
                           Performance has improved.
   Found 12 outliers among 100 measurements (12.00%)
     3 (3.00%) high mild
     9 (9.00%) high severe
   write_batch primitive/4096 values primitive non-null
                           time:   [828.75 µs 839.72 µs 853.12 µs]
                           thrpt:  [202.22 MiB/s 205.44 MiB/s 208.16 MiB/s]
                    change:
                           time:   [-2.1008% -0.2830% +1.5504%] (p = 0.76 > 
0.05)
                           thrpt:  [-1.5267% +0.2838% +2.1458%]
                           No change in performance detected.
   Found 8 outliers among 100 measurements (8.00%)
     5 (5.00%) high mild
     3 (3.00%) high severe
   write_batch primitive/4096 values primitive non-null with bloom filter
                           time:   [5.8224 ms 5.9514 ms 6.1021 ms]
                           thrpt:  [28.271 MiB/s 28.988 MiB/s 29.630 MiB/s]
                    change:
                           time:   [-2.5804% +0.7869% +4.5698%] (p = 0.66 > 
0.05)
                           thrpt:  [-4.3701% -0.7808% +2.6487%]
                           No change in performance detected.
   Found 12 outliers among 100 measurements (12.00%)
     2 (2.00%) high mild
     10 (10.00%) high severe
   write_batch primitive/4096 values bool
                           time:   [134.80 µs 139.82 µs 146.55 µs]
                           thrpt:  [7.2362 MiB/s 7.5847 MiB/s 7.8673 MiB/s]
                    change:
                           time:   [-1.1953% +3.5315% +7.7557%] (p = 0.11 > 
0.05)
                           thrpt:  [-7.1975% -3.4110% +1.2098%]
                           No change in performance detected.
   Found 10 outliers among 100 measurements (10.00%)
     4 (4.00%) high mild
     6 (6.00%) high severe
   write_batch primitive/4096 values bool non-null
                           time:   [98.311 µs 103.44 µs 109.66 µs]
                           thrpt:  [5.2180 MiB/s 5.5315 MiB/s 5.8204 MiB/s]
                    change:
                           time:   [-5.3081% -0.8963% +3.8002%] (p = 0.71 > 
0.05)
                           thrpt:  [-3.6611% +0.9044% +5.6056%]
                           No change in performance detected.
   Found 11 outliers among 100 measurements (11.00%)
     4 (4.00%) high mild
     7 (7.00%) high severe
   write_batch primitive/4096 values string
                           time:   [477.69 µs 490.59 µs 506.19 µs]
                           thrpt:  [156.96 MiB/s 161.95 MiB/s 166.33 MiB/s]
                    change:
                           time:   [-0.2633% +1.7705% +4.3212%] (p = 0.13 > 
0.05)
                           thrpt:  [-4.1422% -1.7397% +0.2640%]
                           No change in performance detected.
   Found 7 outliers among 100 measurements (7.00%)
     5 (5.00%) high mild
     2 (2.00%) high severe
   write_batch primitive/4096 values string with bloom filter
                           time:   [1.8760 ms 1.9232 ms 1.9792 ms]
                           thrpt:  [40.144 MiB/s 41.313 MiB/s 42.352 MiB/s]
                    change:
                           time:   [-5.6584% -1.9112% +1.8672%] (p = 0.33 > 
0.05)
                           thrpt:  [-1.8329% +1.9485% +5.9977%]
                           No change in performance detected.
   Found 8 outliers among 100 measurements (8.00%)
     3 (3.00%) high mild
     5 (5.00%) high severe
   write_batch primitive/4096 values string dictionary
                           time:   [270.59 µs 274.93 µs 280.55 µs]
                           thrpt:  [170.59 MiB/s 174.07 MiB/s 176.87 MiB/s]
                    change:
                           time:   [-7.9332% -4.4905% -1.4335%] (p = 0.01 < 
0.05)
                           thrpt:  [+1.4544% +4.7016% +8.6168%]
                           Performance has improved.
   Found 6 outliers among 100 measurements (6.00%)
     3 (3.00%) high mild
     3 (3.00%) high severe
   Benchmarking write_batch primitive/4096 values string dictionary with bloom 
filter: Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 5.4s, enable flat sampling, or reduce sample count to 60.
   write_batch primitive/4096 values string dictionary with bloom filter
                           time:   [986.47 µs 1.0237 ms 1.0651 ms]
                           thrpt:  [44.934 MiB/s 46.750 MiB/s 48.516 MiB/s]
                    change:
                           time:   [-7.8716% -3.1143% +2.1637%] (p = 0.22 > 
0.05)
                           thrpt:  [-2.1179% +3.2144% +8.5442%]
                           No change in performance detected.
   Found 11 outliers among 100 measurements (11.00%)
     3 (3.00%) high mild
     8 (8.00%) high severe
   write_batch primitive/4096 values string non-null
                           time:   [532.42 µs 543.79 µs 560.73 µs]
                           thrpt:  [139.95 MiB/s 144.31 MiB/s 147.39 MiB/s]
                    change:
                           time:   [-2.6569% +0.3820% +3.8121%] (p = 0.82 > 
0.05)
                           thrpt:  [-3.6721% -0.3805% +2.7294%]
                           No change in performance detected.
   Found 11 outliers among 100 measurements (11.00%)
     7 (7.00%) high mild
     4 (4.00%) high severe
   write_batch primitive/4096 values string non-null with bloom filter
                           time:   [2.0651 ms 2.1229 ms 2.1874 ms]
                           thrpt:  [35.876 MiB/s 36.966 MiB/s 38.000 MiB/s]
                    change:
                           time:   [-3.8886% +0.0602% +4.1637%] (p = 0.98 > 
0.05)
                           thrpt:  [-3.9973% -0.0601% +4.0460%]
                           No change in performance detected.
   Found 13 outliers among 100 measurements (13.00%)
     3 (3.00%) high mild
     10 (10.00%) high severe
   
   Benchmarking write_batch nested/4096 values primitive list: Warming up for 
3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 6.3s, enable flat sampling, or reduce sample count to 60.
   write_batch nested/4096 values primitive list
                           time:   [1.2357 ms 1.2558 ms 1.2785 ms]
                           thrpt:  [127.71 MiB/s 130.03 MiB/s 132.14 MiB/s]
                    change:
                           time:   [-4.8539% -2.7819% -0.8478%] (p = 0.01 < 
0.05)
                           thrpt:  [+0.8550% +2.8615% +5.1016%]
                           Change within noise threshold.
   Found 4 outliers among 100 measurements (4.00%)
     3 (3.00%) high mild
     1 (1.00%) high severe
   Benchmarking write_batch nested/4096 values primitive list non-null: Warming 
up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 7.6s, enable flat sampling, or reduce sample count to 50.
   write_batch nested/4096 values primitive list non-null
                           time:   [1.5065 ms 1.5224 ms 1.5428 ms]
                           thrpt:  [123.16 MiB/s 124.81 MiB/s 126.13 MiB/s]
                    change:
                           time:   [+0.7002% +2.1530% +3.7258%] (p = 0.00 < 
0.05)
                           thrpt:  [-3.5920% -2.1076% -0.6953%]
                           Change within noise threshold.
   Found 4 outliers among 100 measurements (4.00%)
     2 (2.00%) high mild
     2 (2.00%) high severe
   
        Running benches/metadata.rs 
(/Users/adamgs/Code/arrow-rs/target/release/deps/metadata-2726fdae3ce84590)
   open(default)           time:   [18.947 µs 19.091 µs 19.273 µs]
                           change: [+4.4862% +6.1673% +8.2552%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 12 outliers among 100 measurements (12.00%)
     5 (5.00%) high mild
     7 (7.00%) high severe
   
   open(page index)        time:   [791.01 µs 799.32 µs 810.39 µs]
                           change: [-0.3935% +0.9704% +2.6268%] (p = 0.25 > 
0.05)
                           No change in performance detected.
   Found 5 outliers among 100 measurements (5.00%)
     1 (1.00%) high mild
     4 (4.00%) high severe
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] parquet: ByteArrayEncoder allocates large unused FallbackEncoder for Parquet 2 [arrow-rs]

Reply via email to