Re: [PR] GH-41186: [C++][Parquet][Doc] Denote PARQUET:field_id in parquet.rst [arrow]

via GitHub Mon, 13 May 2024 05:40:31 -0700


pitrou commented on code in PR #41187:
URL: https://github.com/apache/arrow/pull/41187#discussion_r1598410714



##########
docs/source/cpp/parquet.rst:
##########
@@ -542,13 +542,28 @@ As an example, when serializing an Arrow LargeList to 
Parquet:
   :func:`ArrowWriterProperties::store_schema` was enabled when writing the 
file;
   otherwise, it is decoded as an Arrow List.
 
+
 Serialization details
 """""""""""""""""""""
 
 The Arrow schema is serialized as a :ref:`Arrow IPC <format-ipc>` schema 
message,
 then base64-encoded and stored under the ``ARROW:schema`` metadata key in
 the Parquet file metadata.
 
+Field Id
+--------
+
+The Parquet format supports an optional integer "field id" which can be 
assigned
+to a field. This is used for example in the
+`Apache Iceberg specification 
<https://github.com/apache/iceberg/blob/main/format/spec.md#column-projection>`__.
+
+On the writer side, If ``PARQUET:field_id`` is present as a metadata key on an 
Arrow field,
+and the corresponding value is a non-negative integer, then it will be used as
+the "field id" in the Parquet file.
+
+On the reader side, Arrow will convert these "field id"s to a metadata key 
named
+``PARQUET:field_id`` on the corresponding Arrow field.
+
 Limitations

Review Comment:
   Can you take a look at the rendered output (you can preview on GitHub UI) 
and make sure the heading are structurally consistent? Especially, the 
"limitations" heading below relates to supported data types.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] GH-41186: [C++][Parquet][Doc] Denote PARQUET:field_id in parquet.rst [arrow]

Reply via email to