kosiew commented on code in PR #22632:
URL: https://github.com/apache/datafusion/pull/22632#discussion_r3333010239
##########
datafusion/common/src/file_options/parquet_writer.rs:
##########
@@ -249,7 +249,8 @@ impl ParquetOptions {
if let Some(encoding) = encoding {
builder = builder.set_encoding(parse_encoding_string(encoding)?);
}
- if let Some(cdc) = use_content_defined_chunking {
+ if content_defined_chunking.enabled {
+ let cdc = content_defined_chunking;
Review Comment:
Small readability thought: `let cdc = content_defined_chunking;` is only
giving a shorter name to the borrowed CDC options. This could be a little
flatter if we used `content_defined_chunking` directly, or bound it in the `if`
condition instead. For example: `if content_defined_chunking.enabled { ... }`.
##########
datafusion/proto-common/src/to_proto/mod.rs:
##########
@@ -938,13 +938,12 @@ impl TryFrom<&ParquetOptions> for
protobuf::ParquetOptions {
coerce_int96_opt:
value.coerce_int96.clone().map(protobuf::parquet_options::CoerceInt96Opt::CoerceInt96),
coerce_int96_tz_opt:
value.coerce_int96_tz.clone().map(protobuf::parquet_options::CoerceInt96TzOpt::CoerceInt96Tz),
max_predicate_cache_size_opt:
value.max_predicate_cache_size.map(|v|
protobuf::parquet_options::MaxPredicateCacheSizeOpt::MaxPredicateCacheSize(v as
u64)),
- content_defined_chunking:
value.use_content_defined_chunking.as_ref().map(|cdc|
- protobuf::CdcOptions {
- min_chunk_size: cdc.min_chunk_size as u64,
- max_chunk_size: cdc.max_chunk_size as u64,
- norm_level: cdc.norm_level,
- }
- ),
+ content_defined_chunking: Some(protobuf::CdcOptions {
Review Comment:
Nice work wiring this through. One small maintainability idea: the
`CdcOptions` to proto field mapping now appears here and in
`proto/src/logical_plan/file_formats.rs`. It might be worth adding a small
helper conversion, such as `impl From<&CdcOptions> for protobuf::CdcOptions` if
that fits the crate boundaries, so future CDC field changes only need one
mapping update per direction.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]