Copilot commented on code in PR #19064:
URL: https://github.com/apache/datafusion/pull/19064#discussion_r2584415638
##########
datafusion/proto-common/src/to_proto/mod.rs:
##########
@@ -881,6 +881,7 @@ impl TryFrom<&ParquetOptions> for protobuf::ParquetOptions {
skip_arrow_metadata: value.skip_arrow_metadata,
coerce_int96_opt:
value.coerce_int96.clone().map(protobuf::parquet_options::CoerceInt96Opt::CoerceInt96),
max_predicate_cache_size_opt:
value.max_predicate_cache_size.map(|v|
protobuf::parquet_options::MaxPredicateCacheSizeOpt::MaxPredicateCacheSize(v as
u64)),
+ enable_sort_pushdown: true,
Review Comment:
The `enable_sort_pushdown` field is hardcoded to `true` instead of using the
value from the actual `ParquetOptions`. This means the configuration value will
not be properly serialized.
Should be:
```rust
enable_sort_pushdown: value.enable_sort_pushdown,
```
```suggestion
enable_sort_pushdown: value.enable_sort_pushdown,
```
##########
datafusion/execution/src/config.rs:
##########
@@ -442,6 +442,12 @@ impl SessionConfig {
self
}
+ /// Enable reverse scan optimization for Parquet files
Review Comment:
The comment says "Enable reverse scan optimization for Parquet files" but
the actual functionality is more general sort pushdown optimization (including
file reordering, not just reverse scanning).
Should be:
```rust
/// Enable sort pushdown optimization for Parquet files
```
```suggestion
/// Enable sort pushdown optimization for Parquet files
```
##########
datafusion/proto/src/logical_plan/file_formats.rs:
##########
@@ -516,6 +517,7 @@ mod parquet {
max_predicate_cache_size:
proto.max_predicate_cache_size_opt.as_ref().map(|opt| match opt {
parquet_options::MaxPredicateCacheSizeOpt::MaxPredicateCacheSize(size) => *size
as usize,
}),
+ enable_sort_pushdown: true,
Review Comment:
The `enable_sort_pushdown` field is hardcoded to `true` instead of using the
value from the protobuf. This means the configuration value from the proto
message will be ignored.
Should be:
```rust
enable_sort_pushdown: proto.enable_sort_pushdown,
```
```suggestion
enable_sort_pushdown: proto.enable_sort_pushdown,
```
##########
datafusion/proto/src/logical_plan/file_formats.rs:
##########
@@ -420,6 +420,7 @@ mod parquet {
max_predicate_cache_size_opt:
global_options.global.max_predicate_cache_size.map(|size| {
parquet_options::MaxPredicateCacheSizeOpt::MaxPredicateCacheSize(size as u64)
}),
+ enable_sort_pushdown: true,
Review Comment:
The `enable_sort_pushdown` field is hardcoded to `true` instead of using the
value from `global_options.global.enable_sort_pushdown`. This means the
configuration value will be ignored when deserializing from protobuf.
Should be:
```rust
enable_sort_pushdown: global_options.global.enable_sort_pushdown,
```
```suggestion
enable_sort_pushdown:
global_options.global.enable_sort_pushdown,
```
##########
datafusion/proto-common/src/from_proto/mod.rs:
##########
@@ -1005,6 +1005,7 @@ impl TryFrom<&protobuf::ParquetOptions> for
ParquetOptions {
max_predicate_cache_size:
value.max_predicate_cache_size_opt.map(|opt| match opt {
protobuf::parquet_options::MaxPredicateCacheSizeOpt::MaxPredicateCacheSize(v)
=> Some(v as usize),
}).unwrap_or(None),
+ enable_sort_pushdown: true,
Review Comment:
The `enable_sort_pushdown` field is hardcoded to `true` instead of using the
value from the protobuf. This means the configuration value from the proto
message will be ignored.
Should be:
```rust
enable_sort_pushdown: value.enable_sort_pushdown,
```
```suggestion
enable_sort_pushdown: value.enable_sort_pushdown,
```
##########
datafusion/common/src/config.rs:
##########
@@ -831,6 +831,23 @@ config_namespace! {
/// writing out already in-memory data, such as from a cached
/// data frame.
pub maximum_buffered_record_batches_per_stream: usize, default = 2
+
+ /// Enable sort pushdown optimization for Parquet files.
+ ///
+ /// When enabled, optimizes queries with ORDER BY:
+ /// - Reordering files based on statistics
+ /// - Reversing row group read order when beneficial
+ ///
+ /// Returns **inexact ordering**: Sort operator is kept for correctness,
+ /// but can terminate early for TopK queries (ORDER BY ... LIMIT N),
+ /// providing huge speedup.
+ ///
+ /// Memory: No additional overhead (only changes read order).
+ ///
+ /// Future TODO: Will add option to support detect perfectly sorted
data and eliminate Sort completely.
+ ///
+ /// Default: true
+ pub enable_sort_pushdown: bool, default = true
Review Comment:
Incorrect indentation: the comment and field declaration use 3 spaces for
indentation instead of 8 spaces to match the surrounding configuration fields
(lines 822-833 and others in this config_namespace block).
```suggestion
/// Enable sort pushdown optimization for Parquet files.
///
/// When enabled, optimizes queries with ORDER BY:
/// - Reordering files based on statistics
/// - Reversing row group read order when beneficial
///
/// Returns **inexact ordering**: Sort operator is kept for
correctness,
/// but can terminate early for TopK queries (ORDER BY ... LIMIT N),
/// providing huge speedup.
///
/// Memory: No additional overhead (only changes read order).
///
/// Future TODO: Will add option to support detect perfectly sorted
data and eliminate Sort completely.
///
/// Default: true
pub enable_sort_pushdown: bool, default = true
```
##########
datafusion/execution/src/config.rs:
##########
@@ -442,6 +442,12 @@ impl SessionConfig {
self
}
+ /// Enable reverse scan optimization for Parquet files
+ pub fn with_parquet_reverse_scan(mut self, enabled: bool) -> Self {
Review Comment:
The method name `with_parquet_reverse_scan` is misleading because:
1. It sets the `enable_sort_pushdown` configuration option, not a
"reverse_scan" option
2. Sort pushdown includes more than just reversing scans (e.g., reordering
files based on statistics)
Should be renamed to match the actual configuration option:
```rust
pub fn with_parquet_enable_sort_pushdown(mut self, enabled: bool) -> Self {
self.options_mut().execution.parquet.enable_sort_pushdown = enabled;
self
}
```
```suggestion
/// Enables or disables sort pushdown optimization for Parquet files
pub fn with_parquet_enable_sort_pushdown(mut self, enabled: bool) ->
Self {
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]