emilk opened a new issue, #10069:
URL: https://github.com/apache/arrow-rs/issues/10069

   ### Background
   Both `RecordBatch/Schema` and `Field` can have `metadata`. In both cases 
they are encoded as `HashMap<String, String>`
   
   One downside with this is that cloning the metadata is slow (requires a deep 
clone and a lot of allocations). This is in contrast with basically everything 
else in `arrow-rs`, which uses an `Arc` for fast cloning.
   
   ### Proposal outline
   
   ```rs
   #[derive(Clone, Default, …)]
   pub struct Metadata(
       // Use `Option` to avoid allocation in case of empty metadata
       Option<Arc<BTreeMap<String, String>>>
   )
   
   impl Metdata {
       pub fn get(&self, key: &str) -> Option<&String> { … }
   
       /// Does deep clone if (and only if) this `Metadata` is shared
       pub fn insert(&mut self, key: impl Into<String>, value: impl 
Into<String>) {
           Arc::make_mut(self.0.get_or_insert_default()).insert(key.into(), 
value.into());
       }
   
       …
   }
   
   impl Index<…> for Metadata …
   
   impl From<HashMap<String, String>> for Metadata { … }
   impl From<BTreeMap<String, String>> for Metadata { … }
   impl Into<HashMap<String, String>> for Metadata { … }
   impl Into<BTreeMap<String, String>> for Metadata { … }
   
   impl IntoIterator, FromIterator, …
   ```
   
   ### PRO/CON vs status quo (`HashMap<String, String>`)
   * PRO: Fast cloning of the whole `Metadata`
   * PRO: Deterministic iteration order (thanks to `BTreeMap`) - good for 
IPC/FFI encoding, test stability, hashing, …
   * NEUTRAL: Can still add/remove `Metadata` fields without extra cost
   * CON: New type; more complexity
   
   
   ### Alternatives
   Instead of storing `String`, we could store `Arc<str>`. That would make it 
efficient to share the same keys across many metadata tables.
   
   The downside is added complexity.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to