Blizzara commented on code in PR #12800:
URL: https://github.com/apache/datafusion/pull/12800#discussion_r1792099612


##########
datafusion/substrait/src/logical_plan/consumer.rs:
##########
@@ -783,7 +783,6 @@ pub async fn from_substrait_rel(
 
                 let t = ctx.table(table_reference.clone()).await?;
                 let t = ensure_schema_compatability(t, substrait_schema)?;
-                let t = t.into_optimized_plan()?;
                 extract_projection(t, &read.projection)

Review Comment:
   > @Blizzara I'm thinking of switching gears a bit and curious what you 
think. I think the easiest solution here will be to first "apply" substrait 
projections to substrait base_schema and only then use 
`ensure_schema_compatability` with the resulting trimmed down schema as an 
input.
   
   I think that'd be fine by me!
   
   > One thing I'm unsure about is what to do when data types between table 
schema and a substrait base_schema do not match for a field which is projected 
away by substrait anyway. For example
   > 
   > * table has schema (a int, b str)
   > * substrait base_schema is (a int, b int) # note that b has an incorrect 
type
   > * substrait projections field is [0]
   > 
   > If I trim down base_schema first and only then check compatibility with 
the actual table, this won't throw an error, it will be as if column b was 
never mentioned in the plan at all. wdyt? Does that sound like the "right" 
behavior?
   
   Also fine by me, given as long as the plan can safely execute (all columns 
used in any way exist and are of right type) then I don't see a need to fail 
that. But one could as well argue the whole included schema should be validated 
for consistency reasons, so dunno if @westonpace @vbarua have stricter opinions 
here?
   
   > (On a related note, I think the same needs to applied to LocalFiles 
ReadType as well, which for some reason right now outright ignores base_schema. 
I think if we allow NamedTable base_schema to be different from the actual 
table schema, makes sense to do the same for LocalFiles).
   
   Yeah, it wasn't done yet just to keep the PR smaller, but agreed the 
behavior should be same across all source types.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to