Re: [PR] [Variant] Speedup `ObjectBuilder` (62x faster) [arrow-rs]

via GitHub Sat, 28 Jun 2025 19:29:21 -0700


friendlymatthew commented on PR #7808:
URL: https://github.com/apache/arrow-rs/pull/7808#issuecomment-3016236400


   Hi, I'm thinking about two ends of the spectrum: objects with a known schema 
(schema-on-write) and objects with a unknown schema (schema-on-read). Most 
variant use cases seem to fall somewhere in the middle, with data that's only 
partially homogenous. 
   
   My benchmarks cover all three scenarios. For each one, I create benchmarks 
that build both individual Variant objects and a list of Variant objects. I 
also included `bench_object_field_names_reverse_order`. This case builds an 
object with field names inserted in the reverse lexicographical order. I think 
this is interesting because we can test the sorting behavior in 
ObjectBuilder::finish.
   
   I ran benchmarks locally (M4 silicon) and I observe a nice improvement in 
performance: 
   
   <img width="1072" alt="Screenshot 2025-06-28 at 10 19 51 PM" 
src="https://github.com/user-attachments/assets/4e7c083b-d32a-4758-8ab1-3da57d9e3304";
 />
   
   <img width="1014" alt="Screenshot 2025-06-28 at 10 19 56 PM" 
src="https://github.com/user-attachments/assets/78f36609-66ec-46e0-8c3a-6fb453373581";
 />
   
   <br>
   
   `bench_object_list_same_schemas` seems to have a small regression, but it's 
not very much above noise and the much larger improvements to the other flows 
make me not too concerned about it.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [Variant] Speedup `ObjectBuilder` (62x faster) [arrow-rs]

Reply via email to