[
https://issues.apache.org/jira/browse/ARROW-17915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dewey Dunnington updated ARROW-17915:
-------------------------------------
Description:
After ARROW-16989 and ARROW-15584, there is new behaviour with ProjectRel. I
implemented a solution that worked with DuckDB's consumer in
https://github.com/voltrondata/substrait-r/pull/181, but when I try with
Arrow's compiler I get an error:
``` r
library(arrow, warn.conflicts = FALSE)
#> Some features are not enabled in this build of Arrow. Run `arrow_info()` for
more information.
plan_as_json <- '{
"extensionUris": [
{
"extensionUriAnchor": 1,
"uri":
"https://github.com/apache/arrow/blob/master/format/substrait/extension_types.yaml"
}
],
"relations": [
{
"rel": {
"project": {
"common": {"emit": {"outputMapping": [2, 3]}},
"input": {
"read": {
"baseSchema": {
"names": ["int", "dbl"],
"struct": {"types": [{"i32": {}}, {"fp64": {}}]}
},
"localFiles": {
"items": [
{
"uriFile": "file://THIS_IS_THE_TEMP_FILE",
"parquet": {}
}
]
}
}
},
"expressions": [
{"selection": {"directReference": {"structField": {"field": 1}}}},
{"selection": {"directReference": {"structField": {"field": 0}}}}
]
}
}
}
]
}'
temp_parquet <- tempfile()
write_parquet(data.frame(int = integer(), dbl = double()), temp_parquet)
plan_as_json <- gsub("THIS_IS_THE_TEMP_FILE", temp_parquet, plan_as_json)
arrow:::do_exec_plan_substrait(plan_as_json)
#> Error: Invalid: Invalid column index to add field.
#>
/Users/dewey/Desktop/rscratch/arrow/cpp/src/arrow/engine/substrait/relation_internal.cc:338
project_schema->AddField( num_columns +
static_cast<int>(project.expressions().size()) - 1, std::move(project_field))
#>
/Users/dewey/Desktop/rscratch/arrow/cpp/src/arrow/engine/substrait/serde.cc:156
FromProto(plan_rel.has_root() ? plan_rel.root().input() : plan_rel.rel(),
ext_set, conversion_options)
```
<sup>Created on 2022-10-03 by the [reprex
package](https://reprex.tidyverse.org) (v2.0.1)</sup>
It's admittedly a goofy thing to do: to compute a new column that is an
identical copy of an existing column and then discard the original. I can and
should simplify the substrait that I'm generating, but maybe this is also valid
substrait that should be accepted?
was:
After ARROW-16989 and ARROW-15584, there is new behaviour with ProjectRel. I
implemented a solution that worked with DuckDB's consumer in
https://github.com/voltrondata/substrait-r/pull/181, but when I try with
Arrow's compiler I get an error:
{code:R}
library(arrow, warn.conflicts = FALSE)
plan_as_json <- '{
"extensionUris": [
{
"extensionUriAnchor": 1,
"uri":
"https://github.com/apache/arrow/blob/master/format/substrait/extension_types.yaml"
}
],
"relations": [
{
"rel": {
"project": {
"common": {"emit": {"outputMapping": [3, 4]}},
"input": {
"read": {
"baseSchema": {
"names": ["int", "dbl"],
"struct": {"types": [{"i32": {}}, {"fp64": {}}]}
},
"localFiles": {
"items": [
{
"uriFile": "file://THIS_IS_THE_TEMP_FILE",
"parquet": {}
}
]
}
}
},
"expressions": [
{"selection": {"directReference": {"structField": {"field": 1}}}},
{"selection": {"directReference": {"structField": {"field": 0}}}}
]
}
}
}
]
}'
temp_parquet <- tempfile()
write_parquet(data.frame(int = integer(), dbl = double()), temp_parquet)
plan_as_json <- gsub("THIS_IS_THE_TEMP_FILE", temp_parquet, plan_as_json)
arrow:::do_exec_plan_substrait(plan_as_json)
#> Error: Invalid: Invalid column index to add field.
#>
/Users/dewey/Desktop/rscratch/arrow/cpp/src/arrow/engine/substrait/relation_internal.cc:338
project_schema->AddField( num_columns +
static_cast<int>(project.expressions().size()) - 1, std::move(project_field))
#>
/Users/dewey/Desktop/rscratch/arrow/cpp/src/arrow/engine/substrait/serde.cc:156
FromProto(plan_rel.has_root() ? plan_rel.root().input() : plan_rel.rel(),
ext_set, conversion_options)
{code}
It's admittedly a goofy thing to do: to compute a new column that is an
identical copy of an existing column and then discard the original. I can and
should simplify the substrait that I'm generating, but maybe this is also valid
substrait that should be accepted?
> [C++] Error when using Substrait ProjectRel
> -------------------------------------------
>
> Key: ARROW-17915
> URL: https://issues.apache.org/jira/browse/ARROW-17915
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++
> Reporter: Dewey Dunnington
> Priority: Major
>
> After ARROW-16989 and ARROW-15584, there is new behaviour with ProjectRel. I
> implemented a solution that worked with DuckDB's consumer in
> https://github.com/voltrondata/substrait-r/pull/181, but when I try with
> Arrow's compiler I get an error:
> ``` r
> library(arrow, warn.conflicts = FALSE)
> #> Some features are not enabled in this build of Arrow. Run `arrow_info()`
> for more information.
> plan_as_json <- '{
> "extensionUris": [
> {
> "extensionUriAnchor": 1,
> "uri":
> "https://github.com/apache/arrow/blob/master/format/substrait/extension_types.yaml"
> }
> ],
> "relations": [
> {
> "rel": {
> "project": {
> "common": {"emit": {"outputMapping": [2, 3]}},
> "input": {
> "read": {
> "baseSchema": {
> "names": ["int", "dbl"],
> "struct": {"types": [{"i32": {}}, {"fp64": {}}]}
> },
> "localFiles": {
> "items": [
> {
> "uriFile": "file://THIS_IS_THE_TEMP_FILE",
> "parquet": {}
> }
> ]
> }
> }
> },
> "expressions": [
> {"selection": {"directReference": {"structField": {"field": 1}}}},
> {"selection": {"directReference": {"structField": {"field": 0}}}}
> ]
> }
> }
> }
> ]
> }'
> temp_parquet <- tempfile()
> write_parquet(data.frame(int = integer(), dbl = double()), temp_parquet)
> plan_as_json <- gsub("THIS_IS_THE_TEMP_FILE", temp_parquet, plan_as_json)
> arrow:::do_exec_plan_substrait(plan_as_json)
> #> Error: Invalid: Invalid column index to add field.
> #>
> /Users/dewey/Desktop/rscratch/arrow/cpp/src/arrow/engine/substrait/relation_internal.cc:338
> project_schema->AddField( num_columns +
> static_cast<int>(project.expressions().size()) - 1, std::move(project_field))
> #>
> /Users/dewey/Desktop/rscratch/arrow/cpp/src/arrow/engine/substrait/serde.cc:156
> FromProto(plan_rel.has_root() ? plan_rel.root().input() : plan_rel.rel(),
> ext_set, conversion_options)
> ```
> <sup>Created on 2022-10-03 by the [reprex
> package](https://reprex.tidyverse.org) (v2.0.1)</sup>
> It's admittedly a goofy thing to do: to compute a new column that is an
> identical copy of an existing column and then discard the original. I can and
> should simplify the substrait that I'm generating, but maybe this is also
> valid substrait that should be accepted?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)