kszucs commented on code in PR #45360: URL: https://github.com/apache/arrow/pull/45360#discussion_r2028675097
########## docs/source/python/parquet.rst: ########## @@ -782,3 +782,57 @@ file decryption properties) is optional and it includes the following options: * ``cache_lifetime``, the lifetime of cached entities (key encryption keys, local wrapping keys, KMS client objects) represented as a ``datetime.timedelta``. + + +Content-Defined Chunking +------------------------ + +.. note:: + This feature is experimental and may change in future releases. + +PyArrow introduces an experimental feature for optimizing Parquet files for content +addressable storage (CAS) systems using content-defined chunking (CDC). This feature +enables efficient deduplication of data across files, improving network transfers and +storage efficiency. Review Comment: Added. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
