Also, the driver shouldn't assume UTF-8 (or any encoding) when constructing 
String from a Binary vector, since that defeats the point of a binary vector! 
Perhaps this should somehow be configurable (though having a lot of little 
configuration options is also not ideal). A parameterized extension type is 
probably the best way to solve this.

On Fri, Sep 30, 2022, at 13:04, Antoine Pitrou wrote:
> Le 30/09/2022 à 18:57, Kevin Bambrick a écrit :
>> The issue I am facing is sending a UTF-16 string over the wire.
>
> Ok, then you can just transcode the strings before sending them as 
> String, *or* you can send them as Binary (not String).
>
> Where do these UTF-16 strings come from?
>
>  > What would the difference be between adding a new data type and an
>  > extension type for UTF-16?
>
> An extension type is for the most part a piece of metadata attached to 
> data represented in an existing data type (such as Binary), and that 
> consumers can optional recognize in order to better interpret the data.
>
> So if one were to make a UTF-16 extension type based on the Binary data 
> type, implementations could either recognize it as Binary or as UTF-16, 
> depending on whether they know about that particular extension type or not.
>
> (in practice, it would make more sense to make a parameterized "encoded 
> text" extension type, instead of making a specific one for UTF-16)
>
> I recommend reading about the Arrow columnar format and especially this 
> section about extension types:
> https://arrow.apache.org/docs/format/Columnar.html#extension-types
>
>
> Regards
>
> Antoine.

Reply via email to