jorisvandenbossche opened a new issue, #42011: URL: https://github.com/apache/arrow/issues/42011
Currently, the physical layouts of the Arrow Columnar Format specification are explained in https://arrow.apache.org/docs/dev/format/Columnar.html. But for how those layouts are used in practice for the different data types, and how the different data types' data and parameters should be interpreted, we refer to [Schema.fbs](https://github.com/apache/arrow/blob/main/format/Schema.fbs). This ensures there is currently one source of truth for this information, but this also has a bunch of downsides that it is "hidden" in that file: - The `Schema.fbs` file is actually for IPC serialization, so it contains some content that is not relevant for just the in-memory columnar format. - For fully understanding the format spec, you need to read both the docs about the layouts as the fbs file for the data types, while it would be easier to understand and follow to have that content together in a single document, instead of split into two distinct places. - Referring to a fbs file in the repo just to find prose documentation is not really a pleasant reader experience (e.g. it would render better in the docs, we can use links, etc). Therefore, I would propose to move the bulk of the explanations about the different data types and parameters to Columnar.rst, to have a cleaner separation of what is the core columnar format, and what is specific about the IPC spec. Then the question is how to deal with the duplication with the fbs file: I think we don't want two places to keep in sync, but would it be fine to cut down the content in the fbs file largely? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
