alamb opened a new issue, #7698:
URL: https://github.com/apache/arrow-rs/issues/7698

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   @scovich  noted 
https://github.com/apache/arrow-rs/pull/7653#discussion_r2147021588
   
   > It's a bit unfortunate we have to double-allocate strings. Ideally the 
dictionary could track &str backed by dict_keys, but I don't know how to manage 
mutability and lifetimes to achieve that...
   
   The basic question / observation is that building up the dictionary will 
likely be a performance bottleneck.
   
   For example something like this will likely be pretty slow:
   ```rust
   let mut builder = VariantBuilder:new();
   let mut object = builder.new_object();
   for i in 0..10000 {
     object.set_value(&format!("foo{i}", 42)
   }
   ```
   
   **Describe the solution you'd like**
   1. Figure out a more performant way to manage the dictionary
   2. Also figure out how to create sorted dictionaries
   
   
   **Describe alternatives you've considered**
   @zeroshade described a pretty elegant solution in the Go variant builder 
which was to write the dictionary fields in the order they are encountered., 
and then decides if the dictionary is sorted when writing it out. The builder 
has an additional API that you can use to directly add a dictionary key
   
   
   One benefit of this approach is that different writers can decide if they 
care about sorted dictionaries and if they do they can do a pre-pass to figure 
them out and add them directly to the builder
   
   For example this might look like 
   ```rust
   let mut builder = VariantBuilder:new();
   // somehow we know that the object will only have keys "a", "b" and "c"
   builder.add_field_name("a");
   builder.add_field_name("b");
   builder.add_field_name("c");
   
   // as long as the objects don't add a new field name, the keys remain sorted
   let mut object = builder.new_object();
   builder.set_field("b", 4);
   builder.set_field("c", 5);
   builder.set_field("a", 6);
   
   // The final variant will have sorted dictionaries
   let (metadata, value) = builder.build();
   let variant_metadata = VariantMetadata::try_new(&metadata).uwnrap()
   assert(variant_metadata.is_sorted())
   ```
   
   
   **Additional context**
   <!--
   Add any other context or screenshots about the feature request here.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to