pitrou commented on code in PR #41958: URL: https://github.com/apache/arrow/pull/41958#discussion_r1627814694
########## docs/source/format/Columnar.rst: ########## @@ -70,21 +70,131 @@ concepts, here is a small glossary to help disambiguate. without taking into account any value semantics. For example, a 32-bit signed integer array and 32-bit floating point array have the same layout. -* **Parent** and **child arrays**: names to express relationships - between physical value arrays in a nested type structure. For - example, a ``List<T>``-type parent array has a T-type array as its - child (see more on lists below). +* **Data type**: An application-facing semantic value type that is + implemented using some physical layout. For example, Decimal128 + values are stored as 16 bytes in a fixed-size binary + layout. A timestamp may be stored as 64-bit fixed-size layout. * **Primitive type**: a data type having no child types. This includes such types as fixed bit-width, variable-size binary, and null types. * **Nested type**: a data type whose full structure depends on one or more other child types. Two fully-specified nested types are equal if and only if their child types are equal. For example, ``List<U>`` is distinct from ``List<V>`` iff U and V are different types. -* **Logical type**: An application-facing semantic value type that is - implemented using some physical layout. For example, Decimal - values are stored as 16 bytes in a fixed-size binary - layout. Similarly, strings can be stored as ``List<1-byte>``. A - timestamp may be stored as 64-bit fixed-size layout. +* **Parent** and **child arrays**: names to express relationships + between physical value arrays in a nested type structure. For + example, a ``List<T>``-type parent array has a T-type array as its + child (see more on lists below). +* **Parametric type**: a type which requires additional parameters + for full determination of its semantics. For example, all nested types + are parametric by construction. A timestamp is also parametric as it needs + a unit (such as microseconds) and a timezone. Review Comment: Ahah, good question. In the context of the Flatbuffers serialization, yes. But there's only a small fixed number of possible parameters, so this is less obvious than for timestamps or nested types, where there's a virtual infinity of possibilities. As @jorisvandenbossche rightfully pointed out below, the Flatbuffers serialization in the IPC format is only one possible representation of the type system; others could work differently; as a matter of fact, the [C Data Interface](https://arrow.apache.org/docs/format/CDataInterface.html#data-type-description-format-strings) uses entirely different format codes for these types. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
