friendlymatthew commented on PR #7808: URL: https://github.com/apache/arrow-rs/pull/7808#issuecomment-3016236400
Hi, I'm thinking about two ends of the spectrum: objects with a known schema (schema-on-write) and objects with a unknown schema (schema-on-read). Most variant use cases seem to fall somewhere in the middle, with data that's only partially homogenous. My benchmarks cover all three scenarios. For each one, I create benchmarks that build both individual Variant objects and a list of Variant objects. I also included `bench_object_field_names_reverse_order`. This case builds an object with field names inserted in the reverse lexicographical order. I think this is interesting because we can test the sorting behavior in ObjectBuilder::finish. I ran benchmarks locally (M4 silicon) and I observe a nice improvement in performance: <img width="1072" alt="Screenshot 2025-06-28 at 10 19 51 PM" src="https://github.com/user-attachments/assets/4e7c083b-d32a-4758-8ab1-3da57d9e3304" /> <img width="1014" alt="Screenshot 2025-06-28 at 10 19 56 PM" src="https://github.com/user-attachments/assets/78f36609-66ec-46e0-8c3a-6fb453373581" /> <br> `bench_object_list_same_schemas` seems to have a small regression, but it's not very much above noise and the much larger improvements to the other flows make me not too concerned about it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org