Rich-T-kid commented on issue #10119:
URL: https://github.com/apache/arrow-rs/issues/10119#issuecomment-4682922551

   @JakeDern were you planning on making a new benchmark or updating the 
existing benchmarks?
   FWIW I think it'd be worth also isolating it into its own benchmark (writer 
& reader). This has to do with what you mentioned:
   > Dictionaries have a lot of special handling in IPC writer code, which we 
want to optimize.
   
   Since there is so much other logic, it'd make sense to have benchmarks that 
focus on small sections, for example `_encode_dictionaries()`, 
`encode_dictionaries()`, and the `DictionaryTracker` struct.
   I also think the dictionary-focused benchmarks could expand the structure of 
the benchmarks to cover different patterns such as the streaming behavior that 
the dictionary format was built around, [arrow-ipc 
docs](https://arrow.apache.org/docs/format/Columnar.html#dictionary-messages). 
It would be nice to have benchmarks that validate/check that the buffer space 
used to track dictionary mappings is reused instead of repeatedly allocated and 
destroyed.
   (from the docs)
   > Alternatively, if isDelta is set to false, then the dictionary replaces 
the existing dictionary for the same ID.
   
   like I mentioned before I haven't looked to closely at the dictionary path, 
but feel free to tag me in the benchmarks PR & ill be happy to take a look!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to