Jorge, I think your analysis is correct. Some historical context on why there is an indication is covered on the original JIRA: https://issues.apache.org/jira/browse/ARROW-257
Some other discussions: https://lists.apache.org/x/thread.html/75028183d54cb4f6ff588b043fe126f10b2cba8e373673fad6ba889d@%3Cdev.arrow.apache.org%3E https://lists.apache.org/x/thread.html/b219ef51dda71bef83dcdec94e68e2881d49f751b29a8c1251f653d5@%3Cdev.arrow.apache.org%3E -Micah On Fri, Aug 13, 2021 at 10:57 AM Keith Kraus <keith.j.kr...@gmail.com> wrote: > How would using the typeid directly work with arbitrary Extension types? > > -Keith > > On Fri, Aug 13, 2021 at 12:49 PM Jorge Cardoso Leitão < > jorgecarlei...@gmail.com> wrote: > > > Hi, > > > > In the UnionArray, there is a level of indirection between types (buffer > of > > i8s) -> typeId (i8) -> field. For example, the generated_union part of > our > > integration tests has the data: > > > > types: [5, 5, 5, 5, 7, 7, 7, 7, 5, 5, 7] (len = 11) > > typeids: [5, 7] > > fields: [int32, utf8] > > > > My understanding is that, to get the field of item 4, we read types[4] > (7), > > look for the index of it in typeids (1), and take the field of index 1 > > (utf8), and then read the value (4 or other depending on sparsess). > > > > Does someone know the rationale for the intermediare typeid? I.e. > couldn't > > the types contain the index of the field directly [0, 0, 0, 0, 1, 1, 1, > 1, > > 0, 0,1] (replace 5 by 0, 7 by 1, and not use typeids)? > > > > Best, > > Jorge > > >