Wes McKinney created ARROW-262:
----------------------------------
Summary: [Format] Add a new format document for metadata and
logical types for messaging and IPC / on-wire/file representations
Key: ARROW-262
URL: https://issues.apache.org/jira/browse/ARROW-262
Project: Apache Arrow
Issue Type: New Feature
Components: Format
Reporter: Wes McKinney
Assignee: Wes McKinney
The existing document
https://github.com/apache/arrow/blob/master/format/Layout.md
Only describes the physical layout of fixed-size, variable-size, and other
nested types (struct, union)
Meanwhile, we have begun drafting Flatbuffers IDL for Arrow metadata:
https://github.com/apache/arrow/blob/master/format/Message.fbs
I will add a document that will, to begin with:
* Explain the mapping between logical types in the metadata. For example,
definitions of important data types: integers, floating point, boolean, string
(UTF-8) and binary
* Where relevant, describing how each logical type's physical memory is
converted to metadata for messaging purposes (e.g. the {{RecordBatch}} concept
in the IDL)
We have already begun prototype implementations in the C++ codebase
(https://github.com/apache/arrow/tree/master/cpp/src/arrow/ipc) so this will
serve as implementation-agnostic documentation.
Subsequently, I will make a follow up patch for discussion to hopefully address
metadata shortfall between the canonical Arrow metadata and the similar
metadata used by the bespoke Feather format
(https://github.com/wesm/feather/blob/master/cpp/src/feather/metadata.fbs)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)