Copilot commented on code in PR #19064:
URL: https://github.com/apache/datafusion/pull/19064#discussion_r2584415638


##########
datafusion/proto-common/src/to_proto/mod.rs:
##########
@@ -881,6 +881,7 @@ impl TryFrom<&ParquetOptions> for protobuf::ParquetOptions {
             skip_arrow_metadata: value.skip_arrow_metadata,
             coerce_int96_opt: 
value.coerce_int96.clone().map(protobuf::parquet_options::CoerceInt96Opt::CoerceInt96),
             max_predicate_cache_size_opt: 
value.max_predicate_cache_size.map(|v| 
protobuf::parquet_options::MaxPredicateCacheSizeOpt::MaxPredicateCacheSize(v as 
u64)),
+            enable_sort_pushdown: true,

Review Comment:
   The `enable_sort_pushdown` field is hardcoded to `true` instead of using the 
value from the actual `ParquetOptions`. This means the configuration value will 
not be properly serialized.
   
   Should be:
   ```rust
   enable_sort_pushdown: value.enable_sort_pushdown,
   ```
   ```suggestion
               enable_sort_pushdown: value.enable_sort_pushdown,
   ```



##########
datafusion/execution/src/config.rs:
##########
@@ -442,6 +442,12 @@ impl SessionConfig {
         self
     }
 
+    /// Enable reverse scan optimization for Parquet files

Review Comment:
   The comment says "Enable reverse scan optimization for Parquet files" but 
the actual functionality is more general sort pushdown optimization (including 
file reordering, not just reverse scanning).
   
   Should be:
   ```rust
   /// Enable sort pushdown optimization for Parquet files
   ```
   ```suggestion
       /// Enable sort pushdown optimization for Parquet files
   ```



##########
datafusion/proto/src/logical_plan/file_formats.rs:
##########
@@ -516,6 +517,7 @@ mod parquet {
             max_predicate_cache_size: 
proto.max_predicate_cache_size_opt.as_ref().map(|opt| match opt {
                 
parquet_options::MaxPredicateCacheSizeOpt::MaxPredicateCacheSize(size) => *size 
as usize,
             }),
+            enable_sort_pushdown: true,

Review Comment:
   The `enable_sort_pushdown` field is hardcoded to `true` instead of using the 
value from the protobuf. This means the configuration value from the proto 
message will be ignored.
   
   Should be:
   ```rust
   enable_sort_pushdown: proto.enable_sort_pushdown,
   ```
   ```suggestion
               enable_sort_pushdown: proto.enable_sort_pushdown,
   ```



##########
datafusion/proto/src/logical_plan/file_formats.rs:
##########
@@ -420,6 +420,7 @@ mod parquet {
                 max_predicate_cache_size_opt: 
global_options.global.max_predicate_cache_size.map(|size| {
                     
parquet_options::MaxPredicateCacheSizeOpt::MaxPredicateCacheSize(size as u64)
                 }),
+                enable_sort_pushdown: true,

Review Comment:
   The `enable_sort_pushdown` field is hardcoded to `true` instead of using the 
value from `global_options.global.enable_sort_pushdown`. This means the 
configuration value will be ignored when deserializing from protobuf.
   
   Should be:
   ```rust
   enable_sort_pushdown: global_options.global.enable_sort_pushdown,
   ```
   ```suggestion
                   enable_sort_pushdown: 
global_options.global.enable_sort_pushdown,
   ```



##########
datafusion/proto-common/src/from_proto/mod.rs:
##########
@@ -1005,6 +1005,7 @@ impl TryFrom<&protobuf::ParquetOptions> for 
ParquetOptions {
             max_predicate_cache_size: 
value.max_predicate_cache_size_opt.map(|opt| match opt {
                 
protobuf::parquet_options::MaxPredicateCacheSizeOpt::MaxPredicateCacheSize(v) 
=> Some(v as usize),
             }).unwrap_or(None),
+            enable_sort_pushdown: true,

Review Comment:
   The `enable_sort_pushdown` field is hardcoded to `true` instead of using the 
value from the protobuf. This means the configuration value from the proto 
message will be ignored.
   
   Should be:
   ```rust
   enable_sort_pushdown: value.enable_sort_pushdown,
   ```
   ```suggestion
               enable_sort_pushdown: value.enable_sort_pushdown,
   ```



##########
datafusion/common/src/config.rs:
##########
@@ -831,6 +831,23 @@ config_namespace! {
         /// writing out already in-memory data, such as from a cached
         /// data frame.
         pub maximum_buffered_record_batches_per_stream: usize, default = 2
+
+       /// Enable sort pushdown optimization for Parquet files.
+       ///
+       /// When enabled, optimizes queries with ORDER BY:
+       /// - Reordering files based on statistics
+       /// - Reversing row group read order when beneficial
+       ///
+       /// Returns **inexact ordering**: Sort operator is kept for correctness,
+       /// but can terminate early for TopK queries (ORDER BY ... LIMIT N),
+       /// providing huge speedup.
+       ///
+       /// Memory: No additional overhead (only changes read order).
+       ///
+       /// Future TODO: Will add option to support detect perfectly sorted 
data and eliminate Sort completely.
+       ///
+       /// Default: true
+       pub enable_sort_pushdown: bool, default = true

Review Comment:
   Incorrect indentation: the comment and field declaration use 3 spaces for 
indentation instead of 8 spaces to match the surrounding configuration fields 
(lines 822-833 and others in this config_namespace block).
   ```suggestion
           /// Enable sort pushdown optimization for Parquet files.
           ///
           /// When enabled, optimizes queries with ORDER BY:
           /// - Reordering files based on statistics
           /// - Reversing row group read order when beneficial
           ///
           /// Returns **inexact ordering**: Sort operator is kept for 
correctness,
           /// but can terminate early for TopK queries (ORDER BY ... LIMIT N),
           /// providing huge speedup.
           ///
           /// Memory: No additional overhead (only changes read order).
           ///
           /// Future TODO: Will add option to support detect perfectly sorted 
data and eliminate Sort completely.
           ///
           /// Default: true
           pub enable_sort_pushdown: bool, default = true
   ```



##########
datafusion/execution/src/config.rs:
##########
@@ -442,6 +442,12 @@ impl SessionConfig {
         self
     }
 
+    /// Enable reverse scan optimization for Parquet files
+    pub fn with_parquet_reverse_scan(mut self, enabled: bool) -> Self {

Review Comment:
   The method name `with_parquet_reverse_scan` is misleading because:
   1. It sets the `enable_sort_pushdown` configuration option, not a 
"reverse_scan" option
   2. Sort pushdown includes more than just reversing scans (e.g., reordering 
files based on statistics)
   
   Should be renamed to match the actual configuration option:
   ```rust
   pub fn with_parquet_enable_sort_pushdown(mut self, enabled: bool) -> Self {
       self.options_mut().execution.parquet.enable_sort_pushdown = enabled;
       self
   }
   ```
   ```suggestion
       /// Enables or disables sort pushdown optimization for Parquet files
       pub fn with_parquet_enable_sort_pushdown(mut self, enabled: bool) -> 
Self {
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to