Hi all, In implementing dictionary decoding for nanoarrow's IPC reader [1] I discovered that it is not possible to represent a dictionary-encoded extension type in the IPC schema serialization. I've filed an issue with the details at [2]...the summary is that a Dictionary with Extension values is exported identically to a Extension with Dictionary storage, which usually leads to an error on read (because no extension types actually support dictionary storage types, except maybe arrow.opaque because it can have arbitrary storage). I was also reminded that arrow-rs can't represent dictionary-encoded extension values at all [3].
Given that there are a number of canonical extension types now, I wonder if there should be a more clear route to roundtripping dictionary-encoded extension types over IPC (either by making this possible to represent in IPC or by making it clear that extension type implementations must handle dictionary encoded storage). Somewhere in the middle would be handling the error on deserialization (i.e., if the extension type in the registry doesn't support dictionary encoded storage, fall back to a dictionary with extension values). Cheers, -dewey [1] https://github.com/apache/arrow-nanoarrow/pull/861 [2] https://github.com/apache/arrow/issues/49704 [3] https://github.com/apache/arrow-rs/issues/7982
