Interesting. This doesn't seem to be a Java issue, per se then. I've seen
admonations in various Arrow Java threads to always specify the Charset for
the conversion - and so assumed more than one Charset was legal - and have
written Arrow Java test code that uses other charsets without ill effect.

I've never attempted to transport that data over the wire or export it
using the C-Data Interface, however. It seems like that's where it would
fall down.

On Thu, Sep 29, 2022 at 3:01 PM James Henderson <j...@juxt.pro> wrote:

> FWIW we'd made a similar assumption. In Schema.fbs [1] the type is called
> Utf8, as well as the Java `ArrowType.Utf8` class - is this a required
> assumption to work with other language Arrow libs, maybe?
>
> James
>
> [1] https://github.com/apache/arrow/blob/master/format/Schema.fbs
>
> On Thu, 29 Sept 2022 at 18:57, Larry White <ljw1...@gmail.com> wrote:
>
> > Hi Kevin,
> >
> > I don't know of any particular restriction regarding string encoding.
> > VarCharVector stores data as a byte array, and the encoding can be set
> > using the Charset class when you convert Strings to and from bytes. Since
> > java strings use UTF-16 internally, I would expect this to 'just work'.
> >
> > larry
> >
> > On Thu, Sep 29, 2022 at 12:46 PM Kevin Bambrick <
> kevinbambri...@gmail.com>
> > wrote:
> >
> > > Hi.
> > >
> > > Was just wondering was support for UTF-16 Strings considered? As far
> as I
> > > am aware VarChar vectors only support UTF-8. Are they something that
> may
> > be
> > > supported in the future?
> > >
> > > Regards.
> > > Kevin.
> > >
> >
>
>
> --
> *James Henderson*
> XTDB Development Manager at *JUXT*
>
> Email j...@juxt.pro
> Website https://juxt.pro
>
> [image: photo]
>

Reply via email to