rongcuid opened a new issue, #39569:
URL: https://github.com/apache/arrow/issues/39569

   ### Describe the enhancement requested
   
   Currently the columnar format is only documented at this page: 
https://arrow.apache.org/docs/format/Columnar.html. However, when I try to 
actually implement the format, I find the physical representation 
underdocumented. 
   
   Particularly, the encoding of primitive types is unclear. The only info 
given is an example int32 layout, but no other layouts are given, while other 
type are unclear. How are booleans represented, for example? Do implementation 
choose what representation they use? I suppose that's not the case as it will 
defeat Arrow's goal.
   
   I was pointed to https://github.com/apache/arrow/blob/main/format/Schema.fbs 
for reference. However, as far as I understand, this specification is only for 
the IPC schema. It includes specification of type information, but when it 
comes to physical representation, there's only `struct Buffer` with a length 
and offset.
   
   I would like a clear documentation of the memory layout of every type 
supported by Arrow. An example specification I can think of is 
[CTF](https://diamon.org/ctf/v1.8.3/), which provides not only layouts of all 
types, but also side-by-side examples of schema, layout, and values. Similar 
documentation will be immensely helpful for Arrow, especially showing layouts 
of various array types.
   
   ### Component(s)
   
   Format


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to