jorisvandenbossche commented on code in PR #38360:
URL: https://github.com/apache/arrow/pull/38360#discussion_r1392194526


##########
python/pyarrow/_parquet.pyx:
##########
@@ -1703,6 +1708,13 @@ cdef shared_ptr[WriterProperties] 
_create_writer_properties(
     # a size larger than this then it will be latched to this value.
     props.max_row_group_length(_MAX_ROW_GROUP_SIZE)
 
+    # checksum
+
+    if page_checksum_enabled:
+        props.enable_page_checksum()
+    else:
+        props.disable_page_checksum()

Review Comment:
   Small naming suggestion: in the Python API, for several other keywords that 
use a "enable" terminology on the C++ side, we use "write_" or "use_" on the 
Python side. For example "enable_statistics" on the C++ side is 
"write_statistics" here. 
   
   So maybe we could also use `write_page_checksum` for the Python user facing 
keyword.



##########
python/pyarrow/parquet/core.py:
##########
@@ -887,6 +891,10 @@ def _sanitize_table(table, new_schema, flavor):
     filtering more efficient than the page header, as it gathers all the
     statistics for a Parquet file in a single place, avoiding scattered I/O.
     Note that the page index is not yet used on the read size by PyArrow.
+page_checksum_enabled : bool, default False
+    Whether to write page checksums in general for all columns.
+    Page checksums enable detection of corruption, which might occur during

Review Comment:
   ```suggestion
       Page checksums enable detection of data corruption, which might occur 
during
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to