pitrou commented on code in PR #41187: URL: https://github.com/apache/arrow/pull/41187#discussion_r1591080513
########## docs/source/cpp/parquet.rst: ########## @@ -542,6 +542,19 @@ As an example, when serializing an Arrow LargeList to Parquet: :func:`ArrowWriterProperties::store_schema` was enabled when writing the file; otherwise, it is decoded as an Arrow List. +Field Id +---------- Review Comment: The header underline needs to have the same length ```suggestion Field Id -------- ``` ########## docs/source/cpp/parquet.rst: ########## @@ -542,6 +542,19 @@ As an example, when serializing an Arrow LargeList to Parquet: :func:`ArrowWriterProperties::store_schema` was enabled when writing the file; otherwise, it is decoded as an Arrow List. +Field Id +---------- + +The Parquet format supports an optional integer "field id" which can be assigned +to a field. This is used in the `iceberg specification <https://github.com/apache/iceberg/blob/main/format/spec.md#column-projection>` __ Review Comment: ```suggestion to a field. This is used for example in the `Apache Iceberg specification <https://github.com/apache/iceberg/blob/main/format/spec.md#column-projection>`__. ``` ########## docs/source/cpp/parquet.rst: ########## @@ -542,6 +542,19 @@ As an example, when serializing an Arrow LargeList to Parquet: :func:`ArrowWriterProperties::store_schema` was enabled when writing the file; otherwise, it is decoded as an Arrow List. +Field Id +---------- + +The Parquet format supports an optional integer "field id" which can be assigned +to a field. This is used in the `iceberg specification <https://github.com/apache/iceberg/blob/main/format/spec.md#column-projection>` __ + +On writer side, If ``PARQUET:field_id`` is present as a metadata key on a field, +and the corresponding value is a non-negative integer, then it will be used as +the "field id" in the parquet file. + +On reader side, Arrow will convert these "field id"s to a metadata key named +``PARQUET:field_id`` on the appropriate field. Review Comment: ```suggestion On the reader side, Arrow will convert these "field id"s to a metadata key named ``PARQUET:field_id`` on the corresponding Arrow field. ``` ########## docs/source/cpp/parquet.rst: ########## @@ -542,6 +542,19 @@ As an example, when serializing an Arrow LargeList to Parquet: :func:`ArrowWriterProperties::store_schema` was enabled when writing the file; otherwise, it is decoded as an Arrow List. +Field Id +---------- + +The Parquet format supports an optional integer "field id" which can be assigned +to a field. This is used in the `iceberg specification <https://github.com/apache/iceberg/blob/main/format/spec.md#column-projection>` __ + +On writer side, If ``PARQUET:field_id`` is present as a metadata key on a field, +and the corresponding value is a non-negative integer, then it will be used as +the "field id" in the parquet file. Review Comment: ```suggestion On the writer side, If ``PARQUET:field_id`` is present as a metadata key on an Arrow field, and the corresponding value is a non-negative integer, then it will be used as the "field id" in the Parquet file. ``` ########## docs/source/cpp/parquet.rst: ########## @@ -542,6 +542,19 @@ As an example, when serializing an Arrow LargeList to Parquet: :func:`ArrowWriterProperties::store_schema` was enabled when writing the file; otherwise, it is decoded as an Arrow List. +Field Id Review Comment: This section should be moved below "Serialization details" below, so that the structure of the document makes sense (you can preview it in the GitHub UI). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org