Re: [PR] GH-41186: [C++][Parquet][Doc] Denote PARQUET:field_id in parquet.rst [arrow]

via GitHub Mon, 06 May 2024 07:22:05 -0700


pitrou commented on code in PR #41187:
URL: https://github.com/apache/arrow/pull/41187#discussion_r1591080513



##########
docs/source/cpp/parquet.rst:
##########
@@ -542,6 +542,19 @@ As an example, when serializing an Arrow LargeList to 
Parquet:
   :func:`ArrowWriterProperties::store_schema` was enabled when writing the 
file;
   otherwise, it is decoded as an Arrow List.
 
+Field Id
+----------

Review Comment:
   The header underline needs to have the same length
   ```suggestion
   Field Id
   --------
   ```



##########
docs/source/cpp/parquet.rst:
##########
@@ -542,6 +542,19 @@ As an example, when serializing an Arrow LargeList to 
Parquet:
   :func:`ArrowWriterProperties::store_schema` was enabled when writing the 
file;
   otherwise, it is decoded as an Arrow List.
 
+Field Id
+----------
+
+The Parquet format supports an optional integer "field id" which can be 
assigned
+to a field. This is used in the `iceberg specification 
<https://github.com/apache/iceberg/blob/main/format/spec.md#column-projection>` 
__

Review Comment:
   ```suggestion
   to a field. This is used for example in the
   `Apache Iceberg specification 
<https://github.com/apache/iceberg/blob/main/format/spec.md#column-projection>`__.
   ```



##########
docs/source/cpp/parquet.rst:
##########
@@ -542,6 +542,19 @@ As an example, when serializing an Arrow LargeList to 
Parquet:
   :func:`ArrowWriterProperties::store_schema` was enabled when writing the 
file;
   otherwise, it is decoded as an Arrow List.
 
+Field Id
+----------
+
+The Parquet format supports an optional integer "field id" which can be 
assigned
+to a field. This is used in the `iceberg specification 
<https://github.com/apache/iceberg/blob/main/format/spec.md#column-projection>` 
__
+
+On writer side, If ``PARQUET:field_id`` is present as a metadata key on a 
field, 
+and the corresponding value is a non-negative integer, then it will be used as 
+the "field id" in the parquet file.
+
+On reader side, Arrow will convert these "field id"s to a metadata key named
+``PARQUET:field_id`` on the appropriate field.

Review Comment:
   ```suggestion
   On the reader side, Arrow will convert these "field id"s to a metadata key 
named
   ``PARQUET:field_id`` on the corresponding Arrow field.
   ```



##########
docs/source/cpp/parquet.rst:
##########
@@ -542,6 +542,19 @@ As an example, when serializing an Arrow LargeList to 
Parquet:
   :func:`ArrowWriterProperties::store_schema` was enabled when writing the 
file;
   otherwise, it is decoded as an Arrow List.
 
+Field Id
+----------
+
+The Parquet format supports an optional integer "field id" which can be 
assigned
+to a field. This is used in the `iceberg specification 
<https://github.com/apache/iceberg/blob/main/format/spec.md#column-projection>` 
__
+
+On writer side, If ``PARQUET:field_id`` is present as a metadata key on a 
field, 
+and the corresponding value is a non-negative integer, then it will be used as 
+the "field id" in the parquet file.

Review Comment:
   ```suggestion
   On the writer side, If ``PARQUET:field_id`` is present as a metadata key on 
an Arrow field, 
   and the corresponding value is a non-negative integer, then it will be used 
as 
   the "field id" in the Parquet file.
   ```



##########
docs/source/cpp/parquet.rst:
##########
@@ -542,6 +542,19 @@ As an example, when serializing an Arrow LargeList to 
Parquet:
   :func:`ArrowWriterProperties::store_schema` was enabled when writing the 
file;
   otherwise, it is decoded as an Arrow List.
 
+Field Id

Review Comment:
   This section should be moved below "Serialization details" below, so that 
the structure of the document makes sense (you can preview it in the GitHub UI).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] GH-41186: [C++][Parquet][Doc] Denote PARQUET:field_id in parquet.rst [arrow]

Reply via email to