Nicola Crane created ARROW-18403:
------------------------------------
Summary: [C++] Error consuming Substrait plan which uses count
function: "only unary aggregate functions are currently supported"
Key: ARROW-18403
URL: https://issues.apache.org/jira/browse/ARROW-18403
Project: Apache Arrow
Issue Type: Bug
Components: C++
Reporter: Nicola Crane
ARROW-17523 added support for the Substrait extension function "count", but
when I write code which produces a Substrait plan which calls it, and then try
to run it in Acero, I get an error.
The plan:
{code:r}
message of type 'substrait.Plan' with 3 fields set
extension_uris {
extension_uri_anchor: 1
uri:
"https://github.com/substrait-io/substrait/blob/main/extensions/functions_arithmetic.yaml"
}
extension_uris {
extension_uri_anchor: 2
uri:
"https://github.com/substrait-io/substrait/blob/main/extensions/functions_comparison.yaml"
}
extension_uris {
extension_uri_anchor: 3
uri:
"https://github.com/substrait-io/substrait/blob/main/extensions/functions_aggregate_generic.yaml"
}
extensions {
extension_function {
extension_uri_reference: 3
function_anchor: 2
name: "count"
}
}
relations {
rel {
aggregate {
input {
project {
common {
emit {
output_mapping: 9
output_mapping: 10
output_mapping: 11
output_mapping: 12
output_mapping: 13
output_mapping: 14
output_mapping: 15
output_mapping: 16
output_mapping: 17
}
}
input {
read {
base_schema {
names: "int"
names: "dbl"
names: "dbl2"
names: "lgl"
names: "false"
names: "chr"
names: "verses"
names: "padded_strings"
names: "some_negative"
struct_ {
types {
i32 {
nullability: NULLABILITY_NULLABLE
}
}
types {
fp64 {
nullability: NULLABILITY_NULLABLE
}
}
types {
fp64 {
nullability: NULLABILITY_NULLABLE
}
}
types {
bool_ {
nullability: NULLABILITY_NULLABLE
}
}
types {
bool_ {
nullability: NULLABILITY_NULLABLE
}
}
types {
string {
nullability: NULLABILITY_NULLABLE
}
}
types {
string {
nullability: NULLABILITY_NULLABLE
}
}
types {
string {
nullability: NULLABILITY_NULLABLE
}
}
types {
fp64 {
nullability: NULLABILITY_NULLABLE
}
}
}
}
local_files {
items {
uri_file: "file:///tmp/RtmpsBsoZJ/file1915f604cff4a"
parquet {
}
}
}
}
}
expressions {
selection {
direct_reference {
struct_field {
}
}
root_reference {
}
}
}
expressions {
selection {
direct_reference {
struct_field {
field: 1
}
}
root_reference {
}
}
}
expressions {
selection {
direct_reference {
struct_field {
field: 2
}
}
root_reference {
}
}
}
expressions {
selection {
direct_reference {
struct_field {
field: 3
}
}
root_reference {
}
}
}
expressions {
selection {
direct_reference {
struct_field {
field: 4
}
}
root_reference {
}
}
}
expressions {
selection {
direct_reference {
struct_field {
field: 5
}
}
root_reference {
}
}
}
expressions {
selection {
direct_reference {
struct_field {
field: 6
}
}
root_reference {
}
}
}
expressions {
selection {
direct_reference {
struct_field {
field: 7
}
}
root_reference {
}
}
}
expressions {
selection {
direct_reference {
struct_field {
field: 8
}
}
root_reference {
}
}
}
}
}
groupings {
grouping_expressions {
selection {
direct_reference {
struct_field {
field: 3
}
}
root_reference {
}
}
}
}
measures {
measure {
function_reference: 2
phase: AGGREGATION_PHASE_INITIAL_TO_RESULT
output_type {
i64 {
nullability: NULLABILITY_NULLABLE
}
}
invocation: AGGREGATION_INVOCATION_ALL
}
}
}
}
}
{code}
The error:
{code:java}
Error: NotImplemented: Only unary aggregate functions are currently supported
/home/nic2/arrow/cpp/src/arrow/engine/substrait/relation_internal.cc:587
converter(aggregate_call)
/home/nic2/arrow/cpp/src/arrow/engine/substrait/serde.cc:153
FromProto(plan_rel.has_root() ? plan_rel.root().input() : plan_rel.rel(),
ext_set, conversion_options)
{code}
I have no idea what the "phase" and "invocation" fields above do, but previous
attempts to get Acero to consume this plan led to errors due to me using
default values instead of the ones specified there (e.g. "Not Implemented:
Unsupported aggregation phase 'AGGREGATION_PHASE_UNSPECIFIED'"), so I just
changed them to see if it helped.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)