cpcloud commented on a change in pull request #10934:
URL: https://github.com/apache/arrow/pull/10934#discussion_r698458633



##########
File path: format/experimental/computeir/Expression.fbs
##########
@@ -0,0 +1,351 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+include "../../Schema.fbs";
+include "Literal.fbs";
+include "InlineBuffer.fbs";
+
+namespace org.apache.arrow.computeir.flatbuf;
+
+/// Access a value for a given map key
+table MapKey {
+  key: string (required);
+}
+
+/// Struct field access
+table StructField {
+  /// The position of the field in the struct schema
+  position: uint32;
+}
+
+/// Zero-based array index
+table ArraySubscript {
+  position: uint32;
+}
+
+/// Zero-based range of elements in an array
+table ArraySlice {
+  /// The start of an array slice, inclusive
+  start_inclusive: uint32;
+  /// The end of an array slice, exclusive
+  end_exclusive: uint32;
+}
+
+/// Field name in a relation
+table FieldName {
+  position: uint32;
+}
+
+/// A union of possible dereference operations
+union Deref {
+  /// Access a value for a given map key
+  MapKey,
+  /// Access the value at a struct field
+  StructField,
+  /// Access the element at a given index in an array
+  ArraySubscript,
+  /// Access a range of elements in an array
+  ArraySlice,
+  /// Access a field of a relation
+  FieldName,
+}
+
+/// Access the data of a field
+table FieldRef {
+  /// A sequence of field names to allow referencing potentially nested fields
+  ref: Deref (required);
+  /// For Expressions which might reference fields in multiple Relations,
+  /// this index may be provided to indicate which Relation's fields
+  /// `path` points into. For example in the case of a join,
+  /// 0 refers to the left relation and 1 to the right relation.
+  relation_index: int;
+}
+
+/// A canonical (probably SQL equivalent) function
+//
+// TODO: variadics
+enum CanonicalFunctionId : uint32 {

Review comment:
       Are we actually sure this is possible to do for all of the types we want 
to support?
   
   What the key is for, say, the `==` operator for a complex type like list or 
struct?
   
   I don't think a wildcard type is well-defined here without more 
clarification. For example, `List<T>` can only be compared with `List<U>` if `T 
== U`, but if `T != U` the operation is undefined.
   
   Unnest is another example.
   
   With a type system that handles generics, you can't write down the type of 
all possible instantiations of any type that has a type parameter, such as 
list, map, and struct.
   
   What is the issue with having a list of functions in some structured format, 
that indicates the canonical name of the function and its arity?
   
   If a producer sends over a call to the add function with input types `int32, 
int32` and output type `int32`, then the consumer would look that up, and if 
it's able to execute that IR, then it does and if it's not able to do so it 
returns an error.

##########
File path: format/experimental/computeir/Expression.fbs
##########
@@ -0,0 +1,351 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+include "../../Schema.fbs";
+include "Literal.fbs";
+include "InlineBuffer.fbs";
+
+namespace org.apache.arrow.computeir.flatbuf;
+
+/// Access a value for a given map key
+table MapKey {
+  key: string (required);
+}
+
+/// Struct field access
+table StructField {
+  /// The position of the field in the struct schema
+  position: uint32;
+}
+
+/// Zero-based array index
+table ArraySubscript {
+  position: uint32;
+}
+
+/// Zero-based range of elements in an array
+table ArraySlice {
+  /// The start of an array slice, inclusive
+  start_inclusive: uint32;
+  /// The end of an array slice, exclusive
+  end_exclusive: uint32;
+}
+
+/// Field name in a relation
+table FieldName {
+  position: uint32;
+}
+
+/// A union of possible dereference operations
+union Deref {
+  /// Access a value for a given map key
+  MapKey,
+  /// Access the value at a struct field
+  StructField,
+  /// Access the element at a given index in an array
+  ArraySubscript,
+  /// Access a range of elements in an array
+  ArraySlice,
+  /// Access a field of a relation
+  FieldName,
+}
+
+/// Access the data of a field
+table FieldRef {
+  /// A sequence of field names to allow referencing potentially nested fields
+  ref: Deref (required);
+  /// For Expressions which might reference fields in multiple Relations,
+  /// this index may be provided to indicate which Relation's fields
+  /// `path` points into. For example in the case of a join,
+  /// 0 refers to the left relation and 1 to the right relation.
+  relation_index: int;
+}
+
+/// A canonical (probably SQL equivalent) function
+//
+// TODO: variadics
+enum CanonicalFunctionId : uint32 {

Review comment:
       Are we actually sure this is possible to do for all of the types we want 
to support?
   
   What the key is for, say, the `==` operator for a complex type like list or 
struct?
   
   I don't think a wildcard type is well-defined here without more 
clarification. For example, `List<T>` can only be compared with `List<U>` if `T 
== U`, but if `T != U` the operation is undefined.
   
   Unnest is another example.
   
   Without a type system that handles generics, you can't write down the type 
of all possible instantiations of any type that has a type parameter, such as 
list, map, and struct.
   
   What is the issue with having a list of functions in some structured format, 
that indicates the canonical name of the function and its arity?
   
   If a producer sends over a call to the add function with input types `int32, 
int32` and output type `int32`, then the consumer would look that up, and if 
it's able to execute that IR, then it does and if it's not able to do so it 
returns an error.

##########
File path: format/experimental/computeir/Expression.fbs
##########
@@ -0,0 +1,351 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+include "../../Schema.fbs";
+include "Literal.fbs";
+include "InlineBuffer.fbs";
+
+namespace org.apache.arrow.computeir.flatbuf;
+
+/// Access a value for a given map key
+table MapKey {
+  key: string (required);
+}
+
+/// Struct field access
+table StructField {
+  /// The position of the field in the struct schema
+  position: uint32;
+}
+
+/// Zero-based array index
+table ArraySubscript {
+  position: uint32;
+}
+
+/// Zero-based range of elements in an array
+table ArraySlice {
+  /// The start of an array slice, inclusive
+  start_inclusive: uint32;
+  /// The end of an array slice, exclusive
+  end_exclusive: uint32;
+}
+
+/// Field name in a relation
+table FieldName {
+  position: uint32;
+}
+
+/// A union of possible dereference operations
+union Deref {
+  /// Access a value for a given map key
+  MapKey,
+  /// Access the value at a struct field
+  StructField,
+  /// Access the element at a given index in an array
+  ArraySubscript,
+  /// Access a range of elements in an array
+  ArraySlice,
+  /// Access a field of a relation
+  FieldName,
+}
+
+/// Access the data of a field
+table FieldRef {
+  /// A sequence of field names to allow referencing potentially nested fields
+  ref: Deref (required);
+  /// For Expressions which might reference fields in multiple Relations,
+  /// this index may be provided to indicate which Relation's fields
+  /// `path` points into. For example in the case of a join,
+  /// 0 refers to the left relation and 1 to the right relation.
+  relation_index: int;
+}
+
+/// A canonical (probably SQL equivalent) function
+//
+// TODO: variadics
+enum CanonicalFunctionId : uint32 {
+  // logical
+  And,
+  Not,
+  Or,
+
+  // arithmetic
+  Add,
+  Subtract,
+  Multiply,
+  Divide,
+  Power,
+  AbsoluteValue,
+  Negate,
+  Sign,
+
+  // date/time/timestamp operations
+  DateSub,
+  DateAdd,
+  DateDiff,
+  TimeAdd,
+  TimeSub,
+  TimeDiff,
+  TimestampAdd,
+  TimestampSub,
+  TimestampDiff,
+
+  // comparison
+  Equals,
+  NotEquals,
+  Greater,
+  GreaterEqual,
+  Less,
+  LessEqual,
+}
+
+table CanonicalFunction {
+  id: CanonicalFunctionId;
+}
+
+table NonCanonicalFunction {
+  name_space: string;
+  name: string (required);
+}
+
+union FunctionImpl {
+  CanonicalFunction,
+  NonCanonicalFunction,
+}
+
+/// A function call expression
+table Call {
+  /// The kind of function call this is.
+  kind: FunctionImpl (required);
+
+  /// The arguments passed to `function_name`.
+  arguments: [Expression] (required);
+
+  /// Parameters for `function_name`; content/format may be unique to each
+  /// value of `function_name`.
+  metadata: InlineBuffer;
+}
+
+/// A single WHEN x THEN y fragment.
+table CaseFragment {
+  when: Expression (required);
+  then: Expression (required);
+}
+
+/// Case statement-style expression.
+table Case {
+  cases: [CaseFragment] (required);
+  /// The default value if no cases match. This is typically NULL in SQL
+  //implementations.
+  ///
+  /// Defaulting to NULL is a frontend choice, so producers must specify NULL
+  /// if that's their desired behavior.
+  default: Expression (required);
+
+  /// Parameters for `function_name`; content/format may be unique to each
+  /// value of `function_name`.
+  metadata: InlineBuffer;
+}
+
+table Cast {
+  /// The expression to cast
+  expression: Expression (required);
+
+  /// The type to cast `argument` to.
+  type: org.apache.arrow.flatbuf.Field (required);
+
+  /// Parameters for `function_name`; content/format may be unique to each
+  /// value of `function_name`.
+  metadata: InlineBuffer;
+}
+
+table Extract {
+  /// Expression from which to extract components.
+  expression: Expression (required);
+
+  /// Field to extract from `expression`.
+  field: string (required);
+
+  /// Parameters for `function_name`; content/format may be unique to each
+  /// value of `function_name`.
+  metadata: InlineBuffer;
+}
+
+/// Whether lesser values should precede greater or vice versa,
+/// also whether nulls should preced or follow values.
+enum Ordering : uint8 {
+  ASCENDING_THEN_NULLS,
+  DESCENDING_THEN_NULLS,
+  NULLS_THEN_ASCENDING,
+  NULLS_THEN_DESCENDING
+}
+
+/// An expression with an order
+table SortKey {
+  expression: Expression (required);
+  ordering: Ordering = ASCENDING_THEN_NULLS;
+}
+
+/// Boundary is unbounded
+table Unbounded {}
+
+union ConcreteBoundImpl {
+  Expression,
+  Unbounded,
+}
+
+/// Boundary is preceding rows, determined by the contained expression
+table Preceding {
+  ipml: ConcreteBoundImpl (required);
+}
+
+/// Boundary is following rows, determined by the contained expression
+table Following {
+  impl: ConcreteBoundImpl (required);

Review comment:
       Thought about this over the weekend, and I'm not very keen on renaming 
everything with a `Wrapper` suffix.
   
   The objects are named the way they are named for a specific reason, and that 
is to indicate that they are the objects developers of IR producers/consumers 
should use.
   
   With `Foo` and `FooWrapper`, it's not clear to me as a developer that I 
should use the `FooWrapper` and not the `Foo`, versus `Foo` and `FooImpl`, 
which to my eye is much more clear about each object's publicity. I could of 
course add a comment about that, but that's net additional work versus what's 
already here, for very little gain other than to remind people.

##########
File path: format/experimental/computeir/Expression.fbs
##########
@@ -0,0 +1,351 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+include "../../Schema.fbs";
+include "Literal.fbs";
+include "InlineBuffer.fbs";
+
+namespace org.apache.arrow.computeir.flatbuf;
+
+/// Access a value for a given map key
+table MapKey {
+  key: string (required);
+}
+
+/// Struct field access
+table StructField {
+  /// The position of the field in the struct schema
+  position: uint32;
+}
+
+/// Zero-based array index
+table ArraySubscript {
+  position: uint32;
+}
+
+/// Zero-based range of elements in an array
+table ArraySlice {
+  /// The start of an array slice, inclusive
+  start_inclusive: uint32;
+  /// The end of an array slice, exclusive
+  end_exclusive: uint32;
+}
+
+/// Field name in a relation
+table FieldName {
+  position: uint32;
+}
+
+/// A union of possible dereference operations
+union Deref {
+  /// Access a value for a given map key
+  MapKey,
+  /// Access the value at a struct field
+  StructField,
+  /// Access the element at a given index in an array
+  ArraySubscript,
+  /// Access a range of elements in an array
+  ArraySlice,
+  /// Access a field of a relation
+  FieldName,
+}
+
+/// Access the data of a field
+table FieldRef {
+  /// A sequence of field names to allow referencing potentially nested fields
+  ref: Deref (required);
+  /// For Expressions which might reference fields in multiple Relations,
+  /// this index may be provided to indicate which Relation's fields
+  /// `path` points into. For example in the case of a join,
+  /// 0 refers to the left relation and 1 to the right relation.
+  relation_index: int;
+}
+
+/// A canonical (probably SQL equivalent) function
+//
+// TODO: variadics
+enum CanonicalFunctionId : uint32 {
+  // logical
+  And,
+  Not,
+  Or,
+
+  // arithmetic
+  Add,
+  Subtract,
+  Multiply,
+  Divide,
+  Power,
+  AbsoluteValue,
+  Negate,
+  Sign,
+
+  // date/time/timestamp operations
+  DateSub,
+  DateAdd,
+  DateDiff,
+  TimeAdd,
+  TimeSub,
+  TimeDiff,
+  TimestampAdd,
+  TimestampSub,
+  TimestampDiff,
+
+  // comparison
+  Equals,
+  NotEquals,
+  Greater,
+  GreaterEqual,
+  Less,
+  LessEqual,
+}
+
+table CanonicalFunction {
+  id: CanonicalFunctionId;
+}
+
+table NonCanonicalFunction {
+  name_space: string;
+  name: string (required);
+}
+
+union FunctionImpl {
+  CanonicalFunction,
+  NonCanonicalFunction,
+}
+
+/// A function call expression
+table Call {
+  /// The kind of function call this is.
+  kind: FunctionImpl (required);
+
+  /// The arguments passed to `function_name`.
+  arguments: [Expression] (required);
+
+  /// Parameters for `function_name`; content/format may be unique to each
+  /// value of `function_name`.
+  metadata: InlineBuffer;
+}
+
+/// A single WHEN x THEN y fragment.
+table CaseFragment {
+  when: Expression (required);
+  then: Expression (required);
+}
+
+/// Case statement-style expression.
+table Case {
+  cases: [CaseFragment] (required);
+  /// The default value if no cases match. This is typically NULL in SQL
+  //implementations.
+  ///
+  /// Defaulting to NULL is a frontend choice, so producers must specify NULL
+  /// if that's their desired behavior.
+  default: Expression (required);
+
+  /// Parameters for `function_name`; content/format may be unique to each
+  /// value of `function_name`.
+  metadata: InlineBuffer;
+}
+
+table Cast {
+  /// The expression to cast
+  expression: Expression (required);
+
+  /// The type to cast `argument` to.
+  type: org.apache.arrow.flatbuf.Field (required);
+
+  /// Parameters for `function_name`; content/format may be unique to each
+  /// value of `function_name`.
+  metadata: InlineBuffer;
+}
+
+table Extract {
+  /// Expression from which to extract components.
+  expression: Expression (required);
+
+  /// Field to extract from `expression`.
+  field: string (required);
+
+  /// Parameters for `function_name`; content/format may be unique to each
+  /// value of `function_name`.
+  metadata: InlineBuffer;
+}
+
+/// Whether lesser values should precede greater or vice versa,
+/// also whether nulls should preced or follow values.
+enum Ordering : uint8 {
+  ASCENDING_THEN_NULLS,
+  DESCENDING_THEN_NULLS,
+  NULLS_THEN_ASCENDING,
+  NULLS_THEN_DESCENDING
+}
+
+/// An expression with an order
+table SortKey {
+  expression: Expression (required);
+  ordering: Ordering = ASCENDING_THEN_NULLS;
+}
+
+/// Boundary is unbounded
+table Unbounded {}
+
+union ConcreteBoundImpl {
+  Expression,
+  Unbounded,
+}
+
+/// Boundary is preceding rows, determined by the contained expression
+table Preceding {
+  ipml: ConcreteBoundImpl (required);
+}
+
+/// Boundary is following rows, determined by the contained expression
+table Following {
+  impl: ConcreteBoundImpl (required);
+}
+
+/// Boundary is the current row
+table CurrentRow {}
+
+union BoundImpl {
+  Preceding,
+  Following,
+  CurrentRow,
+}
+
+/// Boundary of a window
+table Bound {
+  impl: BoundImpl (required);
+}
+
+/// The kind of window function to be executed.
+enum Frame : uint8 {
+  Rows,
+  Range,
+}
+
+/// An expression representing a window function call.
+table WindowCall {
+  /// The kind of window frame
+  kind: Frame;
+  /// The expression to operate over
+  expression: Expression (required);
+  /// Partition keys
+  partitions: [Expression] (required);
+  /// Sort keys
+  orderings: [SortKey] (required);
+  /// Lower window bound
+  lower_bound: Bound (required);
+  /// Upper window bound
+  upper_bound: Bound (required);
+}
+
+/// A canonical (probably SQL equivalent) function
+enum CanonicalAggregateId : uint32 {
+  All,
+  Any,
+  Count,
+  CountTable,
+  Mean,
+  Min,
+  Max,
+  Product,
+  Sum,
+  Variance,
+  StandardDev,
+}
+
+
+table CanonicalAggregate {
+  id: CanonicalAggregateId;
+}
+
+table NonCanonicalAggregate {
+  name_space: string;
+  name: string (required);
+}
+
+union AggregateImpl {
+  CanonicalAggregate,
+  NonCanonicalAggregate,
+}
+
+table AggregateCall {
+  /// The kind of aggregate function being executed
+  kind: AggregateImpl (required);
+
+  /// Aggregate expression arguments
+  arguments: [Expression] (required);
+
+  /// Possible ordering.
+  orderings: [SortKey];
+
+  /// optional per-aggregate filtering
+  predicate: Expression;
+}
+
+/// An expression is one of
+/// - a Literal datum
+/// - a reference to a field from a Relation
+/// - a call to a named function
+/// - a case expression
+/// - a cast expression
+/// - an extract operation
+/// - a window function call
+/// - an aggregate function call
+///
+/// The expressions here that look like function calls such as
+/// Cast,Case and Extract are special in that while they might
+/// fit into a Call, they don't cleanly do so without having
+/// to pass around non-expression arguments as metadata.
+///
+/// AggregateCall and WindowCall are also separate variants
+/// due to special options for each that don't apply to generic
+/// function calls. Again this is done to make it easier
+/// for consumers to deal with the structure of the operation
+union ExpressionImpl {
+  Literal,
+  FieldRef,
+  Call,
+  Case,
+  Cast,
+  Extract,
+  WindowCall,
+  AggregateCall,
+}
+
+/// Expression types
+///
+/// Expressions have a concrete `impl` value, which is a specific operation
+/// They also have a `type` field, which is the output type of the expression,
+/// regardless of operation type.
+///
+/// The only exception so far is Cast, which has a type as input argument, 
which
+/// is equal to output type.
+table Expression {
+  impl: ExpressionImpl (required);
+
+  /// The type of the expression.
+  ///
+  /// This is a field, because the Type union in Schema.fbs
+  /// isn't self-contained: Fields are necessary to describe complex types
+  /// and there's currently no reason to optimize the storage of this.
+  type: org.apache.arrow.flatbuf.Field;

Review comment:
       I think the purpose of this field is potentially being misunderstood.
   
   There are two very broad use cases that this is designed for:
   
   1. A producer and a consumer that are not self-contained. This is the 
broadest use case and I think will be the primary way in which the IR is used. 
In this scenario, the producer inserts a type into this field as the output of 
its type system. The consumer is then required to adhere to what the producer 
asked for, or return an error.
   2. A producer and consumer that are self-contained. This is for a system 
such as a relational database (e.g., DuckDB), where the IR is being used by a 
system that controls both the producer and the consumer. The purpose of the 
type field here is to have a place for the output of a type derivation step to 
be stored after its derivation for later consumption.
   
   Additionally, the field is optional to allow for the type derivation phase 
to happen at any point in time.
   
   What I think would be a mistake is to not have this field here and assume 
the producer and consumer will necessarily derive the same output type for 
every expression. We've already said that defining output type derivation rules 
are the responsibility of producers. Are we changing that decision?
   
   Regarding performance, I don' t think that's something we should focus on 
until we're happy with the design that supports the use cases we want to 
support, and until we have some clarity on what is actually expensive in 
real-world use cases.

##########
File path: format/experimental/computeir/Expression.fbs
##########
@@ -0,0 +1,351 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+include "../../Schema.fbs";
+include "Literal.fbs";
+include "InlineBuffer.fbs";
+
+namespace org.apache.arrow.computeir.flatbuf;
+
+/// Access a value for a given map key
+table MapKey {
+  key: string (required);
+}
+
+/// Struct field access
+table StructField {
+  /// The position of the field in the struct schema
+  position: uint32;
+}
+
+/// Zero-based array index
+table ArraySubscript {
+  position: uint32;
+}
+
+/// Zero-based range of elements in an array
+table ArraySlice {
+  /// The start of an array slice, inclusive
+  start_inclusive: uint32;
+  /// The end of an array slice, exclusive
+  end_exclusive: uint32;
+}
+
+/// Field name in a relation
+table FieldName {
+  position: uint32;
+}
+
+/// A union of possible dereference operations
+union Deref {
+  /// Access a value for a given map key
+  MapKey,
+  /// Access the value at a struct field
+  StructField,
+  /// Access the element at a given index in an array
+  ArraySubscript,
+  /// Access a range of elements in an array
+  ArraySlice,
+  /// Access a field of a relation
+  FieldName,
+}
+
+/// Access the data of a field
+table FieldRef {
+  /// A sequence of field names to allow referencing potentially nested fields
+  ref: Deref (required);
+  /// For Expressions which might reference fields in multiple Relations,
+  /// this index may be provided to indicate which Relation's fields
+  /// `path` points into. For example in the case of a join,
+  /// 0 refers to the left relation and 1 to the right relation.
+  relation_index: int;
+}
+
+/// A canonical (probably SQL equivalent) function
+//
+// TODO: variadics
+enum CanonicalFunctionId : uint32 {
+  // logical
+  And,
+  Not,
+  Or,
+
+  // arithmetic
+  Add,
+  Subtract,
+  Multiply,
+  Divide,
+  Power,
+  AbsoluteValue,
+  Negate,
+  Sign,
+
+  // date/time/timestamp operations
+  DateSub,
+  DateAdd,
+  DateDiff,
+  TimeAdd,
+  TimeSub,
+  TimeDiff,
+  TimestampAdd,
+  TimestampSub,
+  TimestampDiff,
+
+  // comparison
+  Equals,
+  NotEquals,
+  Greater,
+  GreaterEqual,
+  Less,
+  LessEqual,
+}
+
+table CanonicalFunction {
+  id: CanonicalFunctionId;
+}
+
+table NonCanonicalFunction {
+  name_space: string;
+  name: string (required);
+}
+
+union FunctionImpl {
+  CanonicalFunction,
+  NonCanonicalFunction,
+}
+
+/// A function call expression
+table Call {
+  /// The kind of function call this is.
+  kind: FunctionImpl (required);
+
+  /// The arguments passed to `function_name`.
+  arguments: [Expression] (required);
+
+  /// Parameters for `function_name`; content/format may be unique to each
+  /// value of `function_name`.
+  metadata: InlineBuffer;
+}
+
+/// A single WHEN x THEN y fragment.
+table CaseFragment {
+  when: Expression (required);
+  then: Expression (required);
+}
+
+/// Case statement-style expression.
+table Case {
+  cases: [CaseFragment] (required);
+  /// The default value if no cases match. This is typically NULL in SQL
+  //implementations.
+  ///
+  /// Defaulting to NULL is a frontend choice, so producers must specify NULL
+  /// if that's their desired behavior.
+  default: Expression (required);
+
+  /// Parameters for `function_name`; content/format may be unique to each
+  /// value of `function_name`.
+  metadata: InlineBuffer;
+}
+
+table Cast {
+  /// The expression to cast
+  expression: Expression (required);
+
+  /// The type to cast `argument` to.
+  type: org.apache.arrow.flatbuf.Field (required);
+
+  /// Parameters for `function_name`; content/format may be unique to each
+  /// value of `function_name`.
+  metadata: InlineBuffer;
+}
+
+table Extract {
+  /// Expression from which to extract components.
+  expression: Expression (required);
+
+  /// Field to extract from `expression`.
+  field: string (required);
+
+  /// Parameters for `function_name`; content/format may be unique to each
+  /// value of `function_name`.
+  metadata: InlineBuffer;
+}
+
+/// Whether lesser values should precede greater or vice versa,
+/// also whether nulls should preced or follow values.
+enum Ordering : uint8 {
+  ASCENDING_THEN_NULLS,
+  DESCENDING_THEN_NULLS,
+  NULLS_THEN_ASCENDING,
+  NULLS_THEN_DESCENDING
+}
+
+/// An expression with an order
+table SortKey {
+  expression: Expression (required);
+  ordering: Ordering = ASCENDING_THEN_NULLS;
+}
+
+/// Boundary is unbounded
+table Unbounded {}
+
+union ConcreteBoundImpl {
+  Expression,
+  Unbounded,
+}
+
+/// Boundary is preceding rows, determined by the contained expression
+table Preceding {
+  ipml: ConcreteBoundImpl (required);
+}
+
+/// Boundary is following rows, determined by the contained expression
+table Following {
+  impl: ConcreteBoundImpl (required);
+}
+
+/// Boundary is the current row
+table CurrentRow {}
+
+union BoundImpl {
+  Preceding,
+  Following,
+  CurrentRow,
+}
+
+/// Boundary of a window
+table Bound {
+  impl: BoundImpl (required);
+}
+
+/// The kind of window function to be executed.
+enum Frame : uint8 {
+  Rows,
+  Range,
+}
+
+/// An expression representing a window function call.
+table WindowCall {
+  /// The kind of window frame
+  kind: Frame;
+  /// The expression to operate over
+  expression: Expression (required);
+  /// Partition keys
+  partitions: [Expression] (required);
+  /// Sort keys
+  orderings: [SortKey] (required);
+  /// Lower window bound
+  lower_bound: Bound (required);
+  /// Upper window bound
+  upper_bound: Bound (required);
+}
+
+/// A canonical (probably SQL equivalent) function
+enum CanonicalAggregateId : uint32 {
+  All,
+  Any,
+  Count,
+  CountTable,
+  Mean,
+  Min,
+  Max,
+  Product,
+  Sum,
+  Variance,
+  StandardDev,
+}
+
+
+table CanonicalAggregate {
+  id: CanonicalAggregateId;
+}
+
+table NonCanonicalAggregate {
+  name_space: string;
+  name: string (required);
+}
+
+union AggregateImpl {
+  CanonicalAggregate,
+  NonCanonicalAggregate,
+}
+
+table AggregateCall {
+  /// The kind of aggregate function being executed
+  kind: AggregateImpl (required);
+
+  /// Aggregate expression arguments
+  arguments: [Expression] (required);
+
+  /// Possible ordering.
+  orderings: [SortKey];
+
+  /// optional per-aggregate filtering
+  predicate: Expression;
+}
+
+/// An expression is one of
+/// - a Literal datum
+/// - a reference to a field from a Relation
+/// - a call to a named function
+/// - a case expression
+/// - a cast expression
+/// - an extract operation
+/// - a window function call
+/// - an aggregate function call
+///
+/// The expressions here that look like function calls such as
+/// Cast,Case and Extract are special in that while they might
+/// fit into a Call, they don't cleanly do so without having
+/// to pass around non-expression arguments as metadata.
+///
+/// AggregateCall and WindowCall are also separate variants
+/// due to special options for each that don't apply to generic
+/// function calls. Again this is done to make it easier
+/// for consumers to deal with the structure of the operation
+union ExpressionImpl {
+  Literal,
+  FieldRef,
+  Call,
+  Case,
+  Cast,
+  Extract,
+  WindowCall,
+  AggregateCall,
+}
+
+/// Expression types
+///
+/// Expressions have a concrete `impl` value, which is a specific operation
+/// They also have a `type` field, which is the output type of the expression,
+/// regardless of operation type.
+///
+/// The only exception so far is Cast, which has a type as input argument, 
which
+/// is equal to output type.
+table Expression {
+  impl: ExpressionImpl (required);
+
+  /// The type of the expression.
+  ///
+  /// This is a field, because the Type union in Schema.fbs
+  /// isn't self-contained: Fields are necessary to describe complex types
+  /// and there's currently no reason to optimize the storage of this.
+  type: org.apache.arrow.flatbuf.Field;

Review comment:
       > I think that the rational is inverted: usually planners (IR producers) 
are the ones that need to know which signatures the consumers accept, so that 
they can plan eventual detours (e.g. perform some casts to match signatures).
   
   I get what you're saying here, but right now the IR is explicitly not 
concerned with how (or even whether) a producer is aware of what types of 
operations are accepted by a consumer.
   
   This is compatible with systems that *do* have additional knowledge; the IR 
is designed to be unopinionated about how a producer deduces types.
   
   The flow of execution you're describing is compatible with the current IR if 
I understand the flow correctly: the producer can query the possible 
operations, figure out how (and whether) to execute the desired operations, and 
then put all of that into IR and send it to the consumer.
   
   > I would imagine that consumers would have a registry of implemented 
functions, which they would inform producers about before planning, so that the 
producer can error out or "dodge via cast" during planning.
   
   Indeed, see the discussion about registries/signatures here 
https://github.com/apache/arrow/pull/10934/files#r697016776
   
   > The main question for me is whether we want to use logical types or 
physical types.
   
   You can use either, unless the type systems for representing physical and 
logical types are distinct trees. AFAICT Arrow doesn't distinguish between 
logical and physical types in `Schema.fbs`. Is that correct?

##########
File path: format/experimental/computeir/Expression.fbs
##########
@@ -0,0 +1,351 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+include "../../Schema.fbs";
+include "Literal.fbs";
+include "InlineBuffer.fbs";
+
+namespace org.apache.arrow.computeir.flatbuf;
+
+/// Access a value for a given map key
+table MapKey {
+  key: string (required);
+}
+
+/// Struct field access
+table StructField {
+  /// The position of the field in the struct schema
+  position: uint32;
+}
+
+/// Zero-based array index
+table ArraySubscript {
+  position: uint32;
+}
+
+/// Zero-based range of elements in an array
+table ArraySlice {
+  /// The start of an array slice, inclusive
+  start_inclusive: uint32;
+  /// The end of an array slice, exclusive
+  end_exclusive: uint32;
+}
+
+/// Field name in a relation
+table FieldName {
+  position: uint32;
+}
+
+/// A union of possible dereference operations
+union Deref {
+  /// Access a value for a given map key
+  MapKey,
+  /// Access the value at a struct field
+  StructField,
+  /// Access the element at a given index in an array
+  ArraySubscript,
+  /// Access a range of elements in an array
+  ArraySlice,
+  /// Access a field of a relation
+  FieldName,
+}
+
+/// Access the data of a field
+table FieldRef {
+  /// A sequence of field names to allow referencing potentially nested fields
+  ref: Deref (required);
+  /// For Expressions which might reference fields in multiple Relations,
+  /// this index may be provided to indicate which Relation's fields
+  /// `path` points into. For example in the case of a join,
+  /// 0 refers to the left relation and 1 to the right relation.
+  relation_index: int;
+}
+
+/// A canonical (probably SQL equivalent) function
+//
+// TODO: variadics
+enum CanonicalFunctionId : uint32 {
+  // logical
+  And,
+  Not,
+  Or,
+
+  // arithmetic
+  Add,
+  Subtract,
+  Multiply,
+  Divide,
+  Power,
+  AbsoluteValue,
+  Negate,
+  Sign,
+
+  // date/time/timestamp operations
+  DateSub,
+  DateAdd,
+  DateDiff,
+  TimeAdd,
+  TimeSub,
+  TimeDiff,
+  TimestampAdd,
+  TimestampSub,
+  TimestampDiff,
+
+  // comparison
+  Equals,
+  NotEquals,
+  Greater,
+  GreaterEqual,
+  Less,
+  LessEqual,
+}
+
+table CanonicalFunction {
+  id: CanonicalFunctionId;
+}
+
+table NonCanonicalFunction {
+  name_space: string;
+  name: string (required);
+}
+
+union FunctionImpl {
+  CanonicalFunction,
+  NonCanonicalFunction,
+}
+
+/// A function call expression
+table Call {
+  /// The kind of function call this is.
+  kind: FunctionImpl (required);
+
+  /// The arguments passed to `function_name`.
+  arguments: [Expression] (required);
+
+  /// Parameters for `function_name`; content/format may be unique to each
+  /// value of `function_name`.
+  metadata: InlineBuffer;
+}
+
+/// A single WHEN x THEN y fragment.
+table CaseFragment {
+  when: Expression (required);
+  then: Expression (required);
+}
+
+/// Case statement-style expression.
+table Case {
+  cases: [CaseFragment] (required);
+  /// The default value if no cases match. This is typically NULL in SQL
+  //implementations.
+  ///
+  /// Defaulting to NULL is a frontend choice, so producers must specify NULL
+  /// if that's their desired behavior.
+  default: Expression (required);
+
+  /// Parameters for `function_name`; content/format may be unique to each
+  /// value of `function_name`.
+  metadata: InlineBuffer;
+}
+
+table Cast {
+  /// The expression to cast
+  expression: Expression (required);
+
+  /// The type to cast `argument` to.
+  type: org.apache.arrow.flatbuf.Field (required);
+
+  /// Parameters for `function_name`; content/format may be unique to each
+  /// value of `function_name`.
+  metadata: InlineBuffer;
+}
+
+table Extract {
+  /// Expression from which to extract components.
+  expression: Expression (required);
+
+  /// Field to extract from `expression`.
+  field: string (required);
+
+  /// Parameters for `function_name`; content/format may be unique to each
+  /// value of `function_name`.
+  metadata: InlineBuffer;
+}
+
+/// Whether lesser values should precede greater or vice versa,
+/// also whether nulls should preced or follow values.
+enum Ordering : uint8 {
+  ASCENDING_THEN_NULLS,
+  DESCENDING_THEN_NULLS,
+  NULLS_THEN_ASCENDING,
+  NULLS_THEN_DESCENDING
+}
+
+/// An expression with an order
+table SortKey {
+  expression: Expression (required);
+  ordering: Ordering = ASCENDING_THEN_NULLS;
+}
+
+/// Boundary is unbounded
+table Unbounded {}
+
+union ConcreteBoundImpl {
+  Expression,
+  Unbounded,
+}
+
+/// Boundary is preceding rows, determined by the contained expression
+table Preceding {
+  ipml: ConcreteBoundImpl (required);
+}
+
+/// Boundary is following rows, determined by the contained expression
+table Following {
+  impl: ConcreteBoundImpl (required);
+}
+
+/// Boundary is the current row
+table CurrentRow {}
+
+union BoundImpl {
+  Preceding,
+  Following,
+  CurrentRow,
+}
+
+/// Boundary of a window
+table Bound {
+  impl: BoundImpl (required);
+}
+
+/// The kind of window function to be executed.
+enum Frame : uint8 {
+  Rows,
+  Range,
+}
+
+/// An expression representing a window function call.
+table WindowCall {
+  /// The kind of window frame
+  kind: Frame;
+  /// The expression to operate over
+  expression: Expression (required);
+  /// Partition keys
+  partitions: [Expression] (required);
+  /// Sort keys
+  orderings: [SortKey] (required);
+  /// Lower window bound
+  lower_bound: Bound (required);
+  /// Upper window bound
+  upper_bound: Bound (required);
+}
+
+/// A canonical (probably SQL equivalent) function
+enum CanonicalAggregateId : uint32 {
+  All,
+  Any,
+  Count,
+  CountTable,
+  Mean,
+  Min,
+  Max,
+  Product,
+  Sum,
+  Variance,
+  StandardDev,
+}
+
+
+table CanonicalAggregate {
+  id: CanonicalAggregateId;
+}
+
+table NonCanonicalAggregate {
+  name_space: string;
+  name: string (required);
+}
+
+union AggregateImpl {
+  CanonicalAggregate,
+  NonCanonicalAggregate,
+}
+
+table AggregateCall {
+  /// The kind of aggregate function being executed
+  kind: AggregateImpl (required);
+
+  /// Aggregate expression arguments
+  arguments: [Expression] (required);
+
+  /// Possible ordering.
+  orderings: [SortKey];
+
+  /// optional per-aggregate filtering
+  predicate: Expression;
+}
+
+/// An expression is one of
+/// - a Literal datum
+/// - a reference to a field from a Relation
+/// - a call to a named function
+/// - a case expression
+/// - a cast expression
+/// - an extract operation
+/// - a window function call
+/// - an aggregate function call
+///
+/// The expressions here that look like function calls such as
+/// Cast,Case and Extract are special in that while they might
+/// fit into a Call, they don't cleanly do so without having
+/// to pass around non-expression arguments as metadata.
+///
+/// AggregateCall and WindowCall are also separate variants
+/// due to special options for each that don't apply to generic
+/// function calls. Again this is done to make it easier
+/// for consumers to deal with the structure of the operation
+union ExpressionImpl {
+  Literal,
+  FieldRef,
+  Call,
+  Case,
+  Cast,
+  Extract,
+  WindowCall,
+  AggregateCall,
+}
+
+/// Expression types
+///
+/// Expressions have a concrete `impl` value, which is a specific operation
+/// They also have a `type` field, which is the output type of the expression,
+/// regardless of operation type.
+///
+/// The only exception so far is Cast, which has a type as input argument, 
which
+/// is equal to output type.
+table Expression {
+  impl: ExpressionImpl (required);
+
+  /// The type of the expression.
+  ///
+  /// This is a field, because the Type union in Schema.fbs
+  /// isn't self-contained: Fields are necessary to describe complex types
+  /// and there's currently no reason to optimize the storage of this.
+  type: org.apache.arrow.flatbuf.Field;

Review comment:
       @westonpace 
   
   > This sounds like the user needs to create overloads based solely on the 
return type. 
   
   Assuming by "user" you mean IR producer, then what you said isn't correct if 
I understand you correctly. The IR producer's behavior with respect to what IR 
it chooses to generate is entirely up to the producer. The IR has nothing to do 
with type derivation at all.
   
   > This feels like a slippery slope. Wouldn't it be better for DuckDB in this 
case to extend the fbs with their own fields?
   Maybe. We can certainly just leave out types altogether and tell consumers 
that they need to have a way to determine the type of an expression in IR. To 
me that seems like it's going to breed a bunch of code duplication across 
different consumers, but maybe it's less onerous than I think.

##########
File path: format/experimental/computeir/Expression.fbs
##########
@@ -0,0 +1,351 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+include "../../Schema.fbs";
+include "Literal.fbs";
+include "InlineBuffer.fbs";
+
+namespace org.apache.arrow.computeir.flatbuf;
+
+/// Access a value for a given map key
+table MapKey {
+  key: string (required);
+}
+
+/// Struct field access
+table StructField {
+  /// The position of the field in the struct schema
+  position: uint32;
+}
+
+/// Zero-based array index
+table ArraySubscript {
+  position: uint32;
+}
+
+/// Zero-based range of elements in an array
+table ArraySlice {
+  /// The start of an array slice, inclusive
+  start_inclusive: uint32;
+  /// The end of an array slice, exclusive
+  end_exclusive: uint32;
+}
+
+/// Field name in a relation
+table FieldName {
+  position: uint32;
+}
+
+/// A union of possible dereference operations
+union Deref {
+  /// Access a value for a given map key
+  MapKey,
+  /// Access the value at a struct field
+  StructField,
+  /// Access the element at a given index in an array
+  ArraySubscript,
+  /// Access a range of elements in an array
+  ArraySlice,
+  /// Access a field of a relation
+  FieldName,
+}
+
+/// Access the data of a field
+table FieldRef {
+  /// A sequence of field names to allow referencing potentially nested fields
+  ref: Deref (required);
+  /// For Expressions which might reference fields in multiple Relations,
+  /// this index may be provided to indicate which Relation's fields
+  /// `path` points into. For example in the case of a join,
+  /// 0 refers to the left relation and 1 to the right relation.
+  relation_index: int;
+}
+
+/// A canonical (probably SQL equivalent) function
+//
+// TODO: variadics
+enum CanonicalFunctionId : uint32 {
+  // logical
+  And,
+  Not,
+  Or,
+
+  // arithmetic
+  Add,
+  Subtract,
+  Multiply,
+  Divide,
+  Power,
+  AbsoluteValue,
+  Negate,
+  Sign,
+
+  // date/time/timestamp operations
+  DateSub,
+  DateAdd,
+  DateDiff,
+  TimeAdd,
+  TimeSub,
+  TimeDiff,
+  TimestampAdd,
+  TimestampSub,
+  TimestampDiff,
+
+  // comparison
+  Equals,
+  NotEquals,
+  Greater,
+  GreaterEqual,
+  Less,
+  LessEqual,
+}
+
+table CanonicalFunction {
+  id: CanonicalFunctionId;
+}
+
+table NonCanonicalFunction {
+  name_space: string;
+  name: string (required);
+}
+
+union FunctionImpl {
+  CanonicalFunction,
+  NonCanonicalFunction,
+}
+
+/// A function call expression
+table Call {
+  /// The kind of function call this is.
+  kind: FunctionImpl (required);
+
+  /// The arguments passed to `function_name`.
+  arguments: [Expression] (required);
+
+  /// Parameters for `function_name`; content/format may be unique to each
+  /// value of `function_name`.
+  metadata: InlineBuffer;
+}
+
+/// A single WHEN x THEN y fragment.
+table CaseFragment {
+  when: Expression (required);
+  then: Expression (required);
+}
+
+/// Case statement-style expression.
+table Case {
+  cases: [CaseFragment] (required);
+  /// The default value if no cases match. This is typically NULL in SQL
+  //implementations.
+  ///
+  /// Defaulting to NULL is a frontend choice, so producers must specify NULL
+  /// if that's their desired behavior.
+  default: Expression (required);
+
+  /// Parameters for `function_name`; content/format may be unique to each
+  /// value of `function_name`.
+  metadata: InlineBuffer;
+}
+
+table Cast {
+  /// The expression to cast
+  expression: Expression (required);
+
+  /// The type to cast `argument` to.
+  type: org.apache.arrow.flatbuf.Field (required);
+
+  /// Parameters for `function_name`; content/format may be unique to each
+  /// value of `function_name`.
+  metadata: InlineBuffer;
+}
+
+table Extract {
+  /// Expression from which to extract components.
+  expression: Expression (required);
+
+  /// Field to extract from `expression`.
+  field: string (required);
+
+  /// Parameters for `function_name`; content/format may be unique to each
+  /// value of `function_name`.
+  metadata: InlineBuffer;
+}
+
+/// Whether lesser values should precede greater or vice versa,
+/// also whether nulls should preced or follow values.
+enum Ordering : uint8 {
+  ASCENDING_THEN_NULLS,
+  DESCENDING_THEN_NULLS,
+  NULLS_THEN_ASCENDING,
+  NULLS_THEN_DESCENDING
+}
+
+/// An expression with an order
+table SortKey {
+  expression: Expression (required);
+  ordering: Ordering = ASCENDING_THEN_NULLS;
+}
+
+/// Boundary is unbounded
+table Unbounded {}
+
+union ConcreteBoundImpl {
+  Expression,
+  Unbounded,
+}
+
+/// Boundary is preceding rows, determined by the contained expression
+table Preceding {
+  ipml: ConcreteBoundImpl (required);
+}
+
+/// Boundary is following rows, determined by the contained expression
+table Following {
+  impl: ConcreteBoundImpl (required);
+}
+
+/// Boundary is the current row
+table CurrentRow {}
+
+union BoundImpl {
+  Preceding,
+  Following,
+  CurrentRow,
+}
+
+/// Boundary of a window
+table Bound {
+  impl: BoundImpl (required);
+}
+
+/// The kind of window function to be executed.
+enum Frame : uint8 {
+  Rows,
+  Range,
+}
+
+/// An expression representing a window function call.
+table WindowCall {
+  /// The kind of window frame
+  kind: Frame;
+  /// The expression to operate over
+  expression: Expression (required);
+  /// Partition keys
+  partitions: [Expression] (required);
+  /// Sort keys
+  orderings: [SortKey] (required);
+  /// Lower window bound
+  lower_bound: Bound (required);
+  /// Upper window bound
+  upper_bound: Bound (required);
+}
+
+/// A canonical (probably SQL equivalent) function
+enum CanonicalAggregateId : uint32 {
+  All,
+  Any,
+  Count,
+  CountTable,
+  Mean,
+  Min,
+  Max,
+  Product,
+  Sum,
+  Variance,
+  StandardDev,
+}
+
+
+table CanonicalAggregate {
+  id: CanonicalAggregateId;
+}
+
+table NonCanonicalAggregate {
+  name_space: string;
+  name: string (required);
+}
+
+union AggregateImpl {
+  CanonicalAggregate,
+  NonCanonicalAggregate,
+}
+
+table AggregateCall {
+  /// The kind of aggregate function being executed
+  kind: AggregateImpl (required);
+
+  /// Aggregate expression arguments
+  arguments: [Expression] (required);
+
+  /// Possible ordering.
+  orderings: [SortKey];
+
+  /// optional per-aggregate filtering
+  predicate: Expression;
+}
+
+/// An expression is one of
+/// - a Literal datum
+/// - a reference to a field from a Relation
+/// - a call to a named function
+/// - a case expression
+/// - a cast expression
+/// - an extract operation
+/// - a window function call
+/// - an aggregate function call
+///
+/// The expressions here that look like function calls such as
+/// Cast,Case and Extract are special in that while they might
+/// fit into a Call, they don't cleanly do so without having
+/// to pass around non-expression arguments as metadata.
+///
+/// AggregateCall and WindowCall are also separate variants
+/// due to special options for each that don't apply to generic
+/// function calls. Again this is done to make it easier
+/// for consumers to deal with the structure of the operation
+union ExpressionImpl {
+  Literal,
+  FieldRef,
+  Call,
+  Case,
+  Cast,
+  Extract,
+  WindowCall,
+  AggregateCall,
+}
+
+/// Expression types
+///
+/// Expressions have a concrete `impl` value, which is a specific operation
+/// They also have a `type` field, which is the output type of the expression,
+/// regardless of operation type.
+///
+/// The only exception so far is Cast, which has a type as input argument, 
which
+/// is equal to output type.
+table Expression {
+  impl: ExpressionImpl (required);
+
+  /// The type of the expression.
+  ///
+  /// This is a field, because the Type union in Schema.fbs
+  /// isn't self-contained: Fields are necessary to describe complex types
+  /// and there's currently no reason to optimize the storage of this.
+  type: org.apache.arrow.flatbuf.Field;

Review comment:
       @westonpace 
   
   > This sounds like the user needs to create overloads based solely on the 
return type. 
   
   Assuming by "user" you mean IR producer, then what you said isn't correct if 
I understand you correctly. The IR producer's behavior with respect to what IR 
it chooses to generate is entirely up to the producer. The IR has nothing to do 
with type derivation at all.
   
   > This feels like a slippery slope. Wouldn't it be better for DuckDB in this 
case to extend the fbs with their own fields?
   
   Maybe. We can certainly just leave out types altogether and tell consumers 
that they need to have a way to determine the type of an expression in IR. To 
me that seems like it's going to breed a bunch of code duplication across 
different consumers, but maybe it's less onerous than I think.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to