vbarua commented on code in PR #12245:
URL: https://github.com/apache/datafusion/pull/12245#discussion_r1737503917
##########
datafusion/substrait/src/logical_plan/consumer.rs:
##########
@@ -1586,9 +1629,12 @@ fn next_struct_field_name(
}
}
-fn from_substrait_named_struct(
+/// Convert Substrait NamedStruct to DataFusion DFSchemaRef
+pub fn from_substrait_named_struct(
Review Comment:
While this is generally useful, I could see a case for keeping this internal
only for now. That was actually what I tried to do originally, but I'm new-ish
to Rust and couldn't figure out a good way to expose this for use in
integration tests only.
##########
datafusion/substrait/src/logical_plan/consumer.rs:
##########
@@ -1601,12 +1647,17 @@ fn from_substrait_named_struct(
);
if name_idx != base_schema.names.len() {
return substrait_err!(
- "Names list must match exactly to nested
schema, but found {} uses for {} names",
- name_idx,
- base_schema.names.len()
- );
+ "Names list must match exactly to nested schema, but found {} uses
for {} names",
+ name_idx,
+ base_schema.names.len()
+ );
+ }
+ let mut df_schema = DFSchema::try_from(Schema::new(fields?))?;
+ match field_qualifier {
+ None => (),
+ Some(fq) => df_schema = df_schema.replace_qualifier(fq),
}
- Ok(DFSchemaRef::new(DFSchema::try_from(Schema::new(fields?))?))
+ Ok(DFSchemaRef::new(df_schema))
Review Comment:
If this is made public, I think it might make sense to return a `DFSchema`
instead of a `DFSchemaRef` to make it more flexibly for users, whom can choose
to wrap the DFSchema if they need it.
##########
datafusion/substrait/tests/cases/function_test.rs:
##########
@@ -19,40 +19,33 @@
#[cfg(test)]
mod tests {
+ use crate::utils::test::TestSchemaCollector;
use datafusion::common::Result;
- use datafusion::prelude::{CsvReadOptions, SessionContext};
use datafusion_substrait::logical_plan::consumer::from_substrait_plan;
use std::fs::File;
use std::io::BufReader;
use substrait::proto::Plan;
#[tokio::test]
async fn contains_function_test() -> Result<()> {
- let ctx = create_context().await?;
-
let path = "tests/testdata/contains_plan.substrait.json";
- let proto = serde_json::from_reader::<_, Plan>(BufReader::new(
+
+ let proto_plan = serde_json::from_reader::<_, Plan>(BufReader::new(
File::open(path).expect("file not found"),
))
.expect("failed to parse json");
- let plan = from_substrait_plan(&ctx, &proto).await?;
+ let ctx = TestSchemaCollector::generate_context_from_plan(&proto_plan);
+ let plan = from_substrait_plan(&ctx, &proto_plan).await?;
let plan_str = format!("{}", plan);
assert_eq!(
plan_str,
- "Projection: nation.b AS n_name\
- \n Filter: contains(nation.b, Utf8(\"IA\"))\
- \n TableScan: nation projection=[a, b, c, d, e, f]"
+ "Projection: nation.n_name\
+ \n Filter: contains(nation.n_name, Utf8(\"IA\"))\
+ \n TableScan: nation projection=[n_nationkey, n_name,
n_regionkey, n_comment]"
Review Comment:
You can see in this change how the DataFusion and Substrait plans had
different schemas.
##########
datafusion/substrait/tests/utils.rs:
##########
@@ -0,0 +1,154 @@
+#[cfg(test)]
+pub(crate) mod test {
+ use datafusion::catalog_common::TableReference;
+ use datafusion::datasource::empty::EmptyTable;
+ use datafusion::prelude::SessionContext;
+ use datafusion_substrait::extensions::Extensions;
+ use
datafusion_substrait::logical_plan::consumer::from_substrait_named_struct;
+ use std::fs::File;
+ use std::io::BufReader;
+ use std::sync::Arc;
+ use substrait::proto::read_rel::{NamedTable, ReadType};
+ use substrait::proto::rel::RelType;
+ use substrait::proto::{Plan, ReadRel, Rel};
+
+ pub(crate) fn read_json(path: &str) -> Plan {
+ serde_json::from_reader::<_, Plan>(BufReader::new(
+ File::open(path).expect("file not found"),
+ ))
+ .expect("failed to parse json")
+ }
+
+ pub(crate) struct TestSchemaCollector {
+ ctx: SessionContext,
+ }
Review Comment:
This collector is based on a similar bit of tooling in
[Isthmus](https://github.com/substrait-io/substrait-java/blob/d6256487f75032e2e8fc803976d916e4a5bc44bd/isthmus/src/main/java/io/substrait/isthmus/SubstraitToCalcite.java#L102-L128),
a Substrait library used for integrating with Apache Calcite.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]