etseidl commented on PR #8714:
URL: https://github.com/apache/arrow-rs/pull/8714#issuecomment-3453405712
I just did a quick experiment with the `parquet_footer_parsing` rig. I had
to fix 56.2 to skip binary properly. The "57 no stats" is using the index to
completely skip the bytes for the statistics, rather than still parse the
thrift but not materialize anything.
Here's an old run on my workstation with 57.0 just before release
```
+-----------------------------------+---------------------+---------------------------+---------------------+---------------------------+--------------------------------+--------------------------------+
| Description | Parse Time Arrow 56 | Parse Time Arrow
56 | Parse Time Arrow 57 | Parse Time Arrow 57 | Parse Time Arrow
57 (no stats) | Parse Time Arrow 57 (no stats) |
| | |
| | |
| |
| | Metadata | PageIndex
(Column/Offset) | Metadata | PageIndex (Column/Offset) | Metadata
| PageIndex (Column/Offset) |
+=========================================================================================================================================================================================================+
| Float 100 cols 20 row groups | 1.818656ms | 2.742926ms
| 371.597µs | 412.292µs | 278.182µs
| 0ns |
|-----------------------------------+---------------------+---------------------------+---------------------+---------------------------+--------------------------------+--------------------------------|
| Float 1000 cols 20 row groups | 17.94358ms | 27.645205ms
| 3.660315ms | 4.193104ms | 2.802049ms
| 0ns |
|-----------------------------------+---------------------+---------------------------+---------------------+---------------------------+--------------------------------+--------------------------------|
| Float 10000 cols 20 row groups | 185.972585ms | 307.935846ms
| 38.203277ms | 44.805143ms | 29.839642ms
| 0ns |
|-----------------------------------+---------------------+---------------------------+---------------------+---------------------------+--------------------------------+--------------------------------|
| Float 100000 cols 20 row groups | 1.859111093s | 3.277136801s
| 387.584434ms | 464.782303ms | 311.1496ms
| 0ns |
|-----------------------------------+---------------------+---------------------------+---------------------+---------------------------+--------------------------------+--------------------------------|
| String 100 cols 20 row groups | 1.590131ms | 2.502389ms
| 445.781µs | 513.278µs | 277.58µs
| 0ns |
|-----------------------------------+---------------------+---------------------------+---------------------+---------------------------+--------------------------------+--------------------------------|
| String 1000 cols 20 row groups | 15.814435ms | 25.203266ms
| 4.424308ms | 5.022333ms | 2.780101ms
| 0ns |
|-----------------------------------+---------------------+---------------------------+---------------------+---------------------------+--------------------------------+--------------------------------|
| String 10000 cols 20 row groups | 163.855822ms | 269.453287ms
| 45.111337ms | 55.967408ms | 29.530731ms
| 0ns |
|-----------------------------------+---------------------+---------------------------+---------------------+---------------------------+--------------------------------+--------------------------------|
| String 100000 cols 20 row groups | 1.650930706s | 2.882455606s
| 457.848214ms | 567.259783ms | 304.9529ms
| 0ns |
+-----------------------------------+---------------------+---------------------------+---------------------+---------------------------+--------------------------------+--------------------------------+
```
and here's a run using the index (didn't set the page index offsets to 0 so
they're still parsed)
```
+-----------------------------------+---------------------+---------------------------+---------------------+---------------------------+--------------------------------+--------------------------------+
| Description | Parse Time Arrow 56 | Parse Time Arrow
56 | Parse Time Arrow 57 | Parse Time Arrow 57 | Parse Time Arrow
57 (no stats) | Parse Time Arrow 57 (no stats) |
| | |
| | |
| |
| | Metadata | PageIndex
(Column/Offset) | Metadata | PageIndex (Column/Offset) | Metadata
| PageIndex (Column/Offset) |
+=========================================================================================================================================================================================================+
| Float 100 cols 20 row groups | 1.782124ms | 2.788277ms
| 384.006µs | 437.833µs | 190.7µs
| 442.136µs |
|-----------------------------------+---------------------+---------------------------+---------------------+---------------------------+--------------------------------+--------------------------------|
| Float 1000 cols 20 row groups | 17.773675ms | 27.990747ms
| 3.689049ms | 4.011346ms | 1.803588ms
| 4.077435ms |
|-----------------------------------+---------------------+---------------------------+---------------------+---------------------------+--------------------------------+--------------------------------|
| Float 10000 cols 20 row groups | 186.135302ms | 319.160885ms
| 38.658397ms | 48.454485ms | 20.437285ms
| 45.337169ms |
|-----------------------------------+---------------------+---------------------------+---------------------+---------------------------+--------------------------------+--------------------------------|
| Float 100000 cols 20 row groups | 1.850730717s | 3.308524542s
| 392.43504ms | 468.130178ms | 208.983728ms
| 452.502117ms |
|-----------------------------------+---------------------+---------------------------+---------------------+---------------------------+--------------------------------+--------------------------------|
| String 100 cols 20 row groups | 1.551055ms | 3.540635ms
| 451.13µs | 535.625µs | 190.766µs
| 522.67µs |
|-----------------------------------+---------------------+---------------------------+---------------------+---------------------------+--------------------------------+--------------------------------|
| String 1000 cols 20 row groups | 15.781785ms | 25.655606ms
| 4.420568ms | 5.245406ms | 1.85453ms
| 5.031554ms |
|-----------------------------------+---------------------+---------------------------+---------------------+---------------------------+--------------------------------+--------------------------------|
| String 10000 cols 20 row groups | 162.570823ms | 272.449084ms
| 45.722412ms | 58.275058ms | 20.454023ms
| 55.498968ms |
|-----------------------------------+---------------------+---------------------------+---------------------+---------------------------+--------------------------------+--------------------------------|
| String 100000 cols 20 row groups | 1.624907725s | 2.803188356s
| 465.555399ms | 570.005157ms | 208.100961ms
| 548.227371ms |
+-----------------------------------+---------------------+---------------------------+---------------------+---------------------------+--------------------------------+--------------------------------+
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]