[I] Row format that does not guarantee ordering of columns [arrow-rs]

via GitHub Thu, 01 Jan 2026 06:46:59 -0800


rluvaton opened a new issue, #9083:
URL: https://github.com/apache/arrow-rs/issues/9083


   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   Yes, I'm using RowConverter and it start to become a bottleneck,
   some of the use cases of row conversions do not need ordering
   for example:
   1. grouping - grouping by a lot of columns does not need ordering 
requirements
   2. aggregation - using `count(distinct <struct>)` or array_agg with 
distinct, both of which do not need ordering requirements
   3. shuffling
   
   and having ordering requirements limit optimizations.
   
   Bottlenecks that I found and how the ordering limit optimizations described 
below.
   
   **Describe the solution you'd like**
   <!--
   A clear and concise description of what you want to happen.
   -->
   
   **Describe alternatives you've considered**
   <!--
   A clear and concise description of any alternative solutions or features 
you've considered.
   -->
   
   **Additional context**
   <!--
   Add any other context or screenshots about the feature request here.
   -->
   
   -----------
   
   All my benchmarks and profiling were done on the most reliable machine I 
could get:
   
   ```
   c5.metal
   ``` 
   
   <details><summary>Env</summary>
   <p>
   
   
   ```bash
   [ec2-user@ip-172-31-21-167 build]$ ./fastfetch -c ../presets/all.jsonc
                         ec2-user@ip-
                         -------------------------
     ,     #_            OS: Amazon Linux 2023.9.20251208 x86_64
     ~\_  ####_          Host: c5.metal (00001)
    ~~  \_#####\         BIOS (Legacy): 1.0 (4.15)
    ~~     \###|         Chassis: Rack Mount Chassis
    ~~       \#/ ___     Kernel: Linux 6.1.158-180.294.amzn2023.x86_64
     ~~       V~' '->    Init System: systemd 252.23-10.amzn2023
      ~~~         /      Uptime: 3 hours, 52 mins
        ~~._.   _/       Loadavg: 0.04, 0.08, 0.05
           _/ _/         Processes: 825
         _/m/'           Shell: bash 5.2.15
                         LM: sshd 8.7p1 (TTY)
                         Terminal: /dev/pts/0
                         Terminal Size: 160 columns x 57 rows (3200px x 2508px)
                         Terminal Theme: #ACB2BE (FG) - #21252A (BG) [Dark]
                         CPU: 2 x Intel(R) Xeon(R) Platinum 8275CL (96) @ 3.90 
GHz
                         CPU Cache (L1): 48x32.00 KiB (D), 48x32.00 KiB (I)
                         CPU Cache (L2): 48x1.00 MiB (U)
                         CPU Cache (L3): 2x35.75 MiB (U)
                         CPU Usage: 0%
                         Memory: 2.00 GiB / 188.52 GiB (1%)
                         Disk (/): 15.36 GiB / 255.93 GiB (6%) - xfs
                         Date & Time: 2026-01-01 11:53:52
                         Locale: C.UTF-8
                         Network IO (enp125s0): 1.75 KiB/s (IN) - 4.69 KiB/s 
(OUT)
                         Disk IO (Amazon Elastic Block Store): 0 B/s (R) - 4.00 
KiB/s (W)
                         Physical Disk (Amazon Elastic Block Store): 256.00 GiB 
[SSD, Fixed]
                         Version: fastfetch 2.56.1-16 (x86_64)
   ```
   ```bash
   $ ./cpufetch --verbose
   
    Name:                Intel Xeon Platinum 8275CL
    Microarchitecture:   Cascade Lake
    Technology:          14nm
    Max Frequency:       3.900 GHz
    Sockets:             2
    Cores:               24 cores (48 threads)
    Cores (Total):       48 cores (96 threads)
    AVX:                 AVX,AVX2,AVX512
    FMA:                 FMA3
    L1i Size:            32KB (1.5MB Total)
    L1d Size:            32KB (1.5MB Total)
    L2 Size:             1MB (48MB Total)
    L3 Size:             35.75MB (71.5MB Total)
    Peak Performance:    11.98 TFLOP/s
   ```
   
   </p>
   </details> 
   
   ------
   
   When I tried to improve performance of row conversion I first started with 
having the easiest case but with multiple columns
   
   also all benchmarks run with:
   
   `.cargo/config`:
   ```toml
   [build]
   rustflags = ["-C", "force-frame-pointers=yes"]
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Row format that does not guarantee ordering of columns [arrow-rs]

Reply via email to