adriangb commented on code in PR #20202:
URL: https://github.com/apache/datafusion/pull/20202#discussion_r2804845043
##########
datafusion/proto/proto/datafusion.proto:
##########
@@ -896,6 +896,8 @@ message PhysicalExprNode {
UnknownColumn unknown_column = 20;
PhysicalHashExprNode hash_expr = 21;
+
+ PhysicalCastColumnNode cast_column = 22;
Review Comment:
This should be it's own PR, sequenced after adding `OwnedFormatOptions`
##########
datafusion-examples/examples/custom_data_source/custom_file_casts.rs:
##########
@@ -49,9 +49,9 @@ use object_store::{ObjectStore, PutPayload};
pub async fn custom_file_casts() -> Result<()> {
println!("=== Creating example data ===");
- // Create a logical / table schema with an Int32 column
+ // Create a logical / table schema with an Int32 column (nullable)
let logical_schema =
- Arc::new(Schema::new(vec![Field::new("id", DataType::Int32, false)]));
+ Arc::new(Schema::new(vec![Field::new("id", DataType::Int32, true)]));
Review Comment:
This could also be it's own PR
##########
datafusion/common/src/format.rs:
##########
@@ -35,6 +35,204 @@ pub const DEFAULT_CAST_OPTIONS: CastOptions<'static> =
CastOptions {
format_options: DEFAULT_FORMAT_OPTIONS,
};
+/// Owned version of Arrow's `FormatOptions` with all `String` values instead
of `&str`.
+///
+/// Arrow's `FormatOptions<'static>` requires `&'static str` references, which
makes it
+/// difficult to work with dynamic format options. This struct uses `String`
values,
+/// allowing format options to be created and owned at runtime without
lifetime constraints.
+///
Review Comment:
Should we make changes to arrow? It seems unfortunate to have duplicate
types that differ only in their lifetime.
##########
datafusion/physical-expr-adapter/src/schema_rewriter.rs:
##########
@@ -468,7 +468,10 @@ impl DefaultPhysicalExprAdapterRewriter {
column: Column,
logical_field: &Field,
) -> Result<Transformed<Arc<dyn PhysicalExpr>>> {
- let actual_physical_field =
self.physical_file_schema.field(column.index());
+ // Look up the column index in the physical schema by name to ensure
correctness
+ let physical_column_index =
self.physical_file_schema.index_of(column.name())?;
+ let actual_physical_field =
+ self.physical_file_schema.field(physical_column_index);
Review Comment:
@kosiew this change seems like it could be it's own PR. I (and other
reviewers) would really appreciate if each PR is minimally scoped. A PR with
just this fix + some tests can be reviewed and merged in 5 minutes instead of
getting stuck behind other review comments for unrelated things. They can also
be easily cherry picked, back-ported, etc.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]