Copilot commented on code in PR #510:
URL: https://github.com/apache/hudi-rs/pull/510#discussion_r2660118954
##########
crates/datafusion/src/util/expr.rs:
##########
@@ -81,6 +130,52 @@ fn not_expr_to_filter(not_expr: &Expr) ->
Option<HudiFilter> {
}
}
+/// Converts a BETWEEN expression into two filters: >= low AND <= high.
Review Comment:
The documentation comment refers to `PartitionFilter` but the function
returns `HudiFilter`. This should be corrected to match the actual return type.
##########
crates/core/src/storage/mod.rs:
##########
@@ -297,7 +302,26 @@ impl Storage {
builder = builder.with_batch_size(batch_size);
}
- if let Some(projection) = options.projection {
+ // Handle projection: convert column names to indices using builder's
schema
+ if let Some(ref column_names) = options.projection {
+ let arrow_schema = builder.schema();
+ let projection: Vec<usize> = column_names
+ .iter()
+ .map(|name| {
+ arrow_schema.index_of(name).map_err(|_| {
+ let available = arrow_schema
+ .fields()
+ .iter()
+ .map(|f| f.name().as_str())
+ .collect::<Vec<_>>()
+ .join(", ");
+ StorageError::InvalidColumn(format!(
+ "Column '{name}' not found in parquet file schema.
Available columns: [{available}]"
+ ))
+ })
Review Comment:
The error message construction could be more efficient. Building the
available columns string on every error case (even when not needed) may have a
small performance impact when dealing with schemas with many columns.
Consider constructing the available columns string only when the error is
about to be returned (lazy evaluation), or moving it outside the map closure if
all column lookups are expected to fail together.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]