Interesting. This doesn't seem to be a Java issue, per se then. I've seen admonations in various Arrow Java threads to always specify the Charset for the conversion - and so assumed more than one Charset was legal - and have written Arrow Java test code that uses other charsets without ill effect.
I've never attempted to transport that data over the wire or export it using the C-Data Interface, however. It seems like that's where it would fall down. On Thu, Sep 29, 2022 at 3:01 PM James Henderson <j...@juxt.pro> wrote: > FWIW we'd made a similar assumption. In Schema.fbs [1] the type is called > Utf8, as well as the Java `ArrowType.Utf8` class - is this a required > assumption to work with other language Arrow libs, maybe? > > James > > [1] https://github.com/apache/arrow/blob/master/format/Schema.fbs > > On Thu, 29 Sept 2022 at 18:57, Larry White <ljw1...@gmail.com> wrote: > > > Hi Kevin, > > > > I don't know of any particular restriction regarding string encoding. > > VarCharVector stores data as a byte array, and the encoding can be set > > using the Charset class when you convert Strings to and from bytes. Since > > java strings use UTF-16 internally, I would expect this to 'just work'. > > > > larry > > > > On Thu, Sep 29, 2022 at 12:46 PM Kevin Bambrick < > kevinbambri...@gmail.com> > > wrote: > > > > > Hi. > > > > > > Was just wondering was support for UTF-16 Strings considered? As far > as I > > > am aware VarChar vectors only support UTF-8. Are they something that > may > > be > > > supported in the future? > > > > > > Regards. > > > Kevin. > > > > > > > > -- > *James Henderson* > XTDB Development Manager at *JUXT* > > Email j...@juxt.pro > Website https://juxt.pro > > [image: photo] >