On Mon, 25 Nov 2019 09:12:21 -0600 Wes McKinney <wesmck...@gmail.com> wrote: > On Mon, Nov 25, 2019 at 8:52 AM Antoine Pitrou <anto...@python.org> wrote: > > > > > > Hello, > > > > The spec has the following language about union type ids: > > """ > > Types buffer: A buffer of 8-bit signed integers. Each type in the union > > has a corresponding type id whose values are found in this buffer. A > > union with more than 127 possible types can be modeled as a union of unions. > > """ > > https://arrow.apache.org/docs/format/Columnar.html#union-layout > > > > However, in several places the C++ code assumes type ids are unsigned. > > Java doesn't seem to implement type ids (and there is no integration > > task for union types). > > > > In the flatbuffers description, the type ids array is modeled as an > > array of signed 32-bit integers. > > > > Moreover, according to the language above, type ids should be restricted > > to the [0, 127] interval? Which one should it be? > > The (optional) type ids in the metadata provide a correspondence > between the union types / children and the values found in the types > buffer (data). As stated in the spec, the types buffer are 8-bit > signed integers. As I recall the reason that we used [ Int ] in the > metadata was that the Int type is thought to be easier for languages > to work with in general when serializing/deserializing the metadata.
Ok, but is there a reason the C++ code uses `std::vector<uint8_t>` for the type codes? Regards Antoine.