kszucs commented on code in PR #45360:
URL: https://github.com/apache/arrow/pull/45360#discussion_r2083096080


##########
cpp/src/parquet/properties.h:
##########
@@ -245,6 +245,34 @@ class PARQUET_EXPORT ColumnProperties {
   bool page_index_enabled_;
 };
 
+// EXPERIMENTAL: Options for content-defined chunking.
+struct PARQUET_EXPORT CdcOptions {
+  /// Minimum chunk size in bytes, default 256 KiB
+  /// The rolling hash will not be updated until this size is reached for each 
chunk.
+  /// Note that all data sent through the hash function is counted towards the 
chunk
+  /// size, including definition and repetition levels if present.
+  int64_t min_chunk_size;

Review Comment:
   Updated to set the default here and initialize kDefaultCdcOptions using 
those, also added a test case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to