vbarua commented on code in PR #16857:
URL: https://github.com/apache/datafusion/pull/16857#discussion_r2223862112
##########
datafusion/substrait/tests/testdata/test_plans/select_count_from_select_1_virtual_table_expressions.substrait.json:
##########
@@ -0,0 +1,94 @@
+{
Review Comment:
minor: I would suggest we update the name to something like
`virtual_table_with_expressions.substrait.json` which more accurately reflects
what this plan is intended to verify.
##########
datafusion/substrait/src/logical_plan/consumer/rel/read_rel.rs:
##########
@@ -114,14 +114,37 @@ pub async fn from_read_rel(
.await
}
Some(ReadType::VirtualTable(vt)) => {
- if vt.values.is_empty() {
+ if vt.values.is_empty() && vt.expressions.is_empty() {
return Ok(LogicalPlan::EmptyRelation(EmptyRelation {
produce_one_row: false,
schema: DFSchemaRef::new(substrait_schema),
}));
}
- let values = vt
+ let values = if vt.values.is_empty() {
+ let mut exprs = vec![];
+ for row in &vt.expressions {
+ let mut name_idx = 0;
+ let mut row_exprs = vec![];
+ for expression in &row.fields {
+ name_idx += 1;
+ let expr = consumer
+ .consume_expression(expression, &substrait_schema)
+ .await?;
+ row_exprs.push(expr);
+ }
+ if name_idx != named_struct.names.len() {
+ return substrait_err!(
+ "Names list must match exactly to nested
schema, but found {} uses for {} names",
+ name_idx,
+ named_struct.names.len()
+ );
+ }
+ exprs.push(row_exprs);
+ }
+ exprs
Review Comment:
This does introduce a bit of duplication with the block below, but given
that `consume_expression` is async and `from_substrait_literal` is not it would
be a bit tricky to unify the processing. In the long-term Substrait drop
support for the `values` field which is deprecated so we'll be able to remove
the lower block entirely.
##########
datafusion/substrait/tests/testdata/test_plans/select_count_from_select_1_virtual_table_expressions.substrait.json:
##########
@@ -0,0 +1,94 @@
+{
+ "extensionUris": [
+ {
+ "uri":
"https://github.com/substrait-io/substrait/blob/main/extensions/functions_aggregate_generic.yaml"
+ }
+ ],
+ "extensions": [
+ {
+ "extensionFunction": {
+ "functionAnchor": 185,
+ "name": "count:any"
+ }
+ }
+ ],
+ "relations": [
+ {
+ "root": {
+ "input": {
+ "aggregate": {
+ "common": {
+ "direct": {
+ }
+ },
+ "input": {
+ "read": {
+ "common": {
+ "direct": {
+ }
+ },
+ "baseSchema": {
+ "names": [
+ "dummy"
+ ],
+ "struct": {
+ "types": [
+ {
+ "i64": {
+ "nullability": "NULLABILITY_REQUIRED"
+ }
+ }
+ ],
+ "nullability": "NULLABILITY_REQUIRED"
+ }
+ },
+ "virtualTable": {
+ "expressions": [
+ {
+ "fields": [
+ {
+ "literal": {
+ "i64": "0",
+ "nullable": false
+ }
+ }
+ ]
+ }
Review Comment:
To make this a slighly more robust test, I would suggest:
* Using a struct with more than 1 field to verify the name processing is
applied correctly.
* Having more than 1 row to check that multiple rows can be processed.
##########
datafusion/substrait/src/logical_plan/consumer/rel/read_rel.rs:
##########
@@ -114,14 +114,37 @@ pub async fn from_read_rel(
.await
}
Some(ReadType::VirtualTable(vt)) => {
- if vt.values.is_empty() {
+ if vt.values.is_empty() && vt.expressions.is_empty() {
return Ok(LogicalPlan::EmptyRelation(EmptyRelation {
produce_one_row: false,
schema: DFSchemaRef::new(substrait_schema),
}));
}
- let values = vt
+ let values = if vt.values.is_empty() {
Review Comment:
minor: I would suggest checking `!vt.expressions.is_empty()` to make it
clearer that `expressions` are preferred over values. Effectively, it's the
difference between "We are processing expressions because there are no values"
as opposed to "We are processing expressions because they are present".
##########
datafusion/substrait/tests/testdata/test_plans/select_count_from_select_1_virtual_table_expressions.substrait.json:
##########
@@ -0,0 +1,94 @@
+{
+ "extensionUris": [
+ {
+ "uri":
"https://github.com/substrait-io/substrait/blob/main/extensions/functions_aggregate_generic.yaml"
+ }
+ ],
+ "extensions": [
+ {
+ "extensionFunction": {
+ "functionAnchor": 185,
+ "name": "count:any"
+ }
+ }
+ ],
+ "relations": [
+ {
+ "root": {
+ "input": {
+ "aggregate": {
+ "common": {
+ "direct": {
+ }
+ },
+ "input": {
+ "read": {
Review Comment:
minor: I would suggest of getting rid of everything other than the `read` in
this plan, to make it clearer that this is a test of virtual table
functionality. The aggregation in this plan is effectively noise.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]