wesm commented on a change in pull request #10934:
URL: https://github.com/apache/arrow/pull/10934#discussion_r697746512



##########
File path: format/experimental/computeir/Literal.fbs
##########
@@ -0,0 +1,177 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+namespace org.apache.arrow.computeir.flatbuf;
+
+table ArrayLiteral {
+  values: [Literal];
+}
+
+table StructLiteral {
+  values: [KeyValue];
+}
+
+table KeyValue {
+  key: string;
+  value: Literal;
+}
+
+table MapLiteral {
+  values: [KeyValue];
+}
+
+table Int8Literal {
+  value: int8;
+}
+
+table Int16Literal {
+  value: int16;
+}
+
+table Int32Literal {
+  value: int32;
+}
+
+table Int64Literal {
+  value: int64;
+}
+
+table UInt8Literal {
+  value: uint8;
+}
+
+table UInt16Literal {
+  value: uint16;
+}
+
+table UInt32Literal {
+  value: uint32;
+}
+
+table UInt64Literal {
+  value: uint64;
+}
+
+table Float16Literal {
+  value: uint16;
+}
+
+table Float32Literal {
+  value: float32;
+}
+
+table Float64Literal {
+  value: float64;
+}
+
+table DecimalLiteral {
+  value: [uint8]; // 128 bit value.

Review comment:
       nit: the Arrow spec has left the door open to smaller decimals

##########
File path: format/experimental/computeir/Literal.fbs
##########
@@ -0,0 +1,177 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+namespace org.apache.arrow.computeir.flatbuf;
+
+table ArrayLiteral {
+  values: [Literal];
+}
+
+table StructLiteral {
+  values: [KeyValue];
+}
+
+table KeyValue {
+  key: string;
+  value: Literal;
+}
+
+table MapLiteral {
+  values: [KeyValue];
+}
+
+table Int8Literal {
+  value: int8;
+}
+
+table Int16Literal {
+  value: int16;
+}
+
+table Int32Literal {
+  value: int32;
+}
+
+table Int64Literal {
+  value: int64;
+}
+
+table UInt8Literal {
+  value: uint8;
+}
+
+table UInt16Literal {
+  value: uint16;
+}
+
+table UInt32Literal {
+  value: uint32;
+}
+
+table UInt64Literal {
+  value: uint64;
+}
+
+table Float16Literal {
+  value: uint16;
+}
+
+table Float32Literal {
+  value: float32;
+}
+
+table Float64Literal {
+  value: float64;
+}
+
+table DecimalLiteral {
+  value: [uint8]; // 128 bit value.
+  scale: uint8;
+  precision: uint8;
+}
+
+table BooleanLiteral {
+  value: bool;
+}
+
+table NullLiteral {}
+
+table DateLiteral {
+  // milliseconds since epoch
+  value: int64;
+}
+
+table TimeLiteral {
+  // nanosecond time value.
+  value: int64;
+}
+
+table TimestampLiteral {
+  // microsecond timestamp.
+  value: int64;

Review comment:
       nit: need unit 

##########
File path: format/experimental/computeir/Literal.fbs
##########
@@ -0,0 +1,177 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+namespace org.apache.arrow.computeir.flatbuf;
+
+table ArrayLiteral {
+  values: [Literal];
+}
+
+table StructLiteral {
+  values: [KeyValue];
+}
+
+table KeyValue {
+  key: string;
+  value: Literal;
+}
+
+table MapLiteral {
+  values: [KeyValue];
+}
+
+table Int8Literal {
+  value: int8;
+}
+
+table Int16Literal {
+  value: int16;
+}
+
+table Int32Literal {
+  value: int32;
+}
+
+table Int64Literal {
+  value: int64;
+}
+
+table UInt8Literal {
+  value: uint8;
+}
+
+table UInt16Literal {
+  value: uint16;
+}
+
+table UInt32Literal {
+  value: uint32;
+}
+
+table UInt64Literal {
+  value: uint64;
+}
+
+table Float16Literal {
+  value: uint16;
+}
+
+table Float32Literal {
+  value: float32;
+}
+
+table Float64Literal {
+  value: float64;
+}
+
+table DecimalLiteral {
+  value: [uint8]; // 128 bit value.
+  scale: uint8;
+  precision: uint8;
+}
+
+table BooleanLiteral {
+  value: bool;
+}
+
+table NullLiteral {}
+
+table DateLiteral {
+  // milliseconds since epoch
+  value: int64;
+}
+
+table TimeLiteral {
+  // nanosecond time value.

Review comment:
       nit: should we put a unit here? 

##########
File path: format/experimental/computeir/Expression.fbs
##########
@@ -0,0 +1,351 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+include "../../Schema.fbs";
+include "Literal.fbs";
+include "InlineBuffer.fbs";
+
+namespace org.apache.arrow.computeir.flatbuf;
+
+/// Access a value for a given map key
+table MapKey {
+  key: string (required);
+}
+
+/// Struct field access
+table StructField {
+  /// The position of the field in the struct schema
+  position: uint32;
+}
+
+/// Zero-based array index
+table ArraySubscript {
+  position: uint32;
+}
+
+/// Zero-based range of elements in an array
+table ArraySlice {
+  /// The start of an array slice, inclusive
+  start_inclusive: uint32;
+  /// The end of an array slice, exclusive
+  end_exclusive: uint32;
+}
+
+/// Field name in a relation
+table FieldName {
+  position: uint32;
+}
+
+/// A union of possible dereference operations
+union Deref {
+  /// Access a value for a given map key
+  MapKey,
+  /// Access the value at a struct field
+  StructField,
+  /// Access the element at a given index in an array
+  ArraySubscript,
+  /// Access a range of elements in an array
+  ArraySlice,
+  /// Access a field of a relation
+  FieldName,
+}
+
+/// Access the data of a field
+table FieldRef {
+  /// A sequence of field names to allow referencing potentially nested fields
+  ref: Deref (required);
+  /// For Expressions which might reference fields in multiple Relations,
+  /// this index may be provided to indicate which Relation's fields
+  /// `path` points into. For example in the case of a join,
+  /// 0 refers to the left relation and 1 to the right relation.
+  relation_index: int;
+}
+
+/// A canonical (probably SQL equivalent) function
+//
+// TODO: variadics
+enum CanonicalFunctionId : uint32 {

Review comment:
       > My suggestion is ids here, a separate yaml or similar structured doc 
that lists canonical functions that includes not only structured data around 
type, etc but also description/details around specific behavior.
   
   I think for scalar functions, since there are so many (but not necessarily 
relational operators), I also prefer this approach to having all the functions 
listed in an enum on the Flatbuffers file, so that we can have an append-only 
yaml file with all the functions. Using an integer id to identify a function 
versus a string makes the IR smaller (good — only ever need 4 bytes, even only 
2 bytes, to identify a function) and implementations marginally less 
complicated since many engines will have string identifiers for functions that 
are different than the "canonical" names in our function inventory. The 
inventory of scalar functions is likely to grow very fast and not having to 
modify the Flatbuffers files would be beneficial 
   
   > Just because new functions are introduced doesn't mean we should change 
the format version number. The absence of a function in a particular consumer 
shouldn't really matter.
   
   I agree with this
   
   > Each of these will actually have many variations and each should be 
identified separately add(int,int) => int vs add(bigint,bigint) => bigint.
   
   I see arguments both ways on using a different function id for each overload 
of a function. The main annoyance I would see in having a different id for each 
overload of a function is that IR implementations would have to maintain a huge 
IR mapping table, versus using a common id for all variants of "add" for 
example. On the other hand, if there is a different id for every overload, then 
it leaves no ambiguity about what the input and output types should be. But 
that is a massive scope increase for this project to enumerate all the function 
signatures of every function overload contemplated... 

##########
File path: format/experimental/computeir/Expression.fbs
##########
@@ -0,0 +1,351 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+include "../../Schema.fbs";
+include "Literal.fbs";
+include "InlineBuffer.fbs";
+
+namespace org.apache.arrow.computeir.flatbuf;
+
+/// Access a value for a given map key
+table MapKey {
+  key: string (required);
+}
+
+/// Struct field access
+table StructField {
+  /// The position of the field in the struct schema
+  position: uint32;
+}
+
+/// Zero-based array index
+table ArraySubscript {
+  position: uint32;
+}
+
+/// Zero-based range of elements in an array
+table ArraySlice {
+  /// The start of an array slice, inclusive
+  start_inclusive: uint32;
+  /// The end of an array slice, exclusive
+  end_exclusive: uint32;
+}
+
+/// Field name in a relation
+table FieldName {
+  position: uint32;
+}
+
+/// A union of possible dereference operations
+union Deref {
+  /// Access a value for a given map key
+  MapKey,
+  /// Access the value at a struct field
+  StructField,
+  /// Access the element at a given index in an array
+  ArraySubscript,
+  /// Access a range of elements in an array
+  ArraySlice,
+  /// Access a field of a relation
+  FieldName,
+}
+
+/// Access the data of a field
+table FieldRef {
+  /// A sequence of field names to allow referencing potentially nested fields
+  ref: Deref (required);
+  /// For Expressions which might reference fields in multiple Relations,
+  /// this index may be provided to indicate which Relation's fields
+  /// `path` points into. For example in the case of a join,
+  /// 0 refers to the left relation and 1 to the right relation.
+  relation_index: int;
+}
+
+/// A canonical (probably SQL equivalent) function
+//
+// TODO: variadics
+enum CanonicalFunctionId : uint32 {
+  // logical
+  And,
+  Not,
+  Or,
+
+  // arithmetic
+  Add,
+  Subtract,
+  Multiply,
+  Divide,
+  Power,
+  AbsoluteValue,
+  Negate,
+  Sign,
+
+  // date/time/timestamp operations
+  DateSub,
+  DateAdd,
+  DateDiff,
+  TimeAdd,
+  TimeSub,
+  TimeDiff,
+  TimestampAdd,
+  TimestampSub,
+  TimestampDiff,
+
+  // comparison
+  Equals,
+  NotEquals,
+  Greater,
+  GreaterEqual,
+  Less,
+  LessEqual,
+}
+
+table CanonicalFunction {
+  id: CanonicalFunctionId;
+}
+
+table NonCanonicalFunction {
+  name_space: string;
+  name: string (required);
+}
+
+union FunctionImpl {
+  CanonicalFunction,
+  NonCanonicalFunction,
+}
+
+/// A function call expression
+table Call {
+  /// The kind of function call this is.
+  kind: FunctionImpl (required);
+
+  /// The arguments passed to `function_name`.
+  arguments: [Expression] (required);
+
+  /// Parameters for `function_name`; content/format may be unique to each
+  /// value of `function_name`.
+  metadata: InlineBuffer;
+}
+
+/// A single WHEN x THEN y fragment.
+table CaseFragment {
+  when: Expression (required);
+  then: Expression (required);
+}
+
+/// Case statement-style expression.
+table Case {
+  cases: [CaseFragment] (required);
+  /// The default value if no cases match. This is typically NULL in SQL
+  //implementations.
+  ///
+  /// Defaulting to NULL is a frontend choice, so producers must specify NULL
+  /// if that's their desired behavior.
+  default: Expression (required);
+
+  /// Parameters for `function_name`; content/format may be unique to each
+  /// value of `function_name`.
+  metadata: InlineBuffer;
+}
+
+table Cast {
+  /// The expression to cast
+  expression: Expression (required);
+
+  /// The type to cast `argument` to.
+  type: org.apache.arrow.flatbuf.Field (required);
+
+  /// Parameters for `function_name`; content/format may be unique to each
+  /// value of `function_name`.
+  metadata: InlineBuffer;
+}
+
+table Extract {
+  /// Expression from which to extract components.
+  expression: Expression (required);
+
+  /// Field to extract from `expression`.
+  field: string (required);
+
+  /// Parameters for `function_name`; content/format may be unique to each
+  /// value of `function_name`.
+  metadata: InlineBuffer;
+}
+
+/// Whether lesser values should precede greater or vice versa,
+/// also whether nulls should preced or follow values.
+enum Ordering : uint8 {
+  ASCENDING_THEN_NULLS,
+  DESCENDING_THEN_NULLS,
+  NULLS_THEN_ASCENDING,
+  NULLS_THEN_DESCENDING
+}
+
+/// An expression with an order
+table SortKey {
+  expression: Expression (required);
+  ordering: Ordering = ASCENDING_THEN_NULLS;
+}
+
+/// Boundary is unbounded
+table Unbounded {}
+
+union ConcreteBoundImpl {
+  Expression,
+  Unbounded,
+}
+
+/// Boundary is preceding rows, determined by the contained expression
+table Preceding {
+  ipml: ConcreteBoundImpl (required);
+}
+
+/// Boundary is following rows, determined by the contained expression
+table Following {
+  impl: ConcreteBoundImpl (required);
+}
+
+/// Boundary is the current row
+table CurrentRow {}
+
+union BoundImpl {
+  Preceding,
+  Following,
+  CurrentRow,
+}
+
+/// Boundary of a window
+table Bound {
+  impl: BoundImpl (required);
+}
+
+/// The kind of window function to be executed.
+enum Frame : uint8 {
+  Rows,
+  Range,
+}
+
+/// An expression representing a window function call.
+table WindowCall {
+  /// The kind of window frame
+  kind: Frame;
+  /// The expression to operate over
+  expression: Expression (required);
+  /// Partition keys
+  partitions: [Expression] (required);
+  /// Sort keys
+  orderings: [SortKey] (required);
+  /// Lower window bound
+  lower_bound: Bound (required);
+  /// Upper window bound
+  upper_bound: Bound (required);
+}
+
+/// A canonical (probably SQL equivalent) function
+enum CanonicalAggregateId : uint32 {
+  All,
+  Any,
+  Count,
+  CountTable,
+  Mean,
+  Min,
+  Max,
+  Product,
+  Sum,
+  Variance,
+  StandardDev,
+}
+
+
+table CanonicalAggregate {
+  id: CanonicalAggregateId;
+}
+
+table NonCanonicalAggregate {
+  name_space: string;
+  name: string (required);
+}
+
+union AggregateImpl {
+  CanonicalAggregate,
+  NonCanonicalAggregate,
+}
+
+table AggregateCall {
+  /// The kind of aggregate function being executed
+  kind: AggregateImpl (required);
+
+  /// Aggregate expression arguments
+  arguments: [Expression] (required);
+
+  /// Possible ordering.
+  orderings: [SortKey];
+
+  /// optional per-aggregate filtering
+  predicate: Expression;
+}
+
+/// An expression is one of
+/// - a Literal datum
+/// - a reference to a field from a Relation
+/// - a call to a named function
+/// - a case expression
+/// - a cast expression
+/// - an extract operation
+/// - a window function call
+/// - an aggregate function call
+///
+/// The expressions here that look like function calls such as
+/// Cast,Case and Extract are special in that while they might
+/// fit into a Call, they don't cleanly do so without having
+/// to pass around non-expression arguments as metadata.
+///
+/// AggregateCall and WindowCall are also separate variants
+/// due to special options for each that don't apply to generic
+/// function calls. Again this is done to make it easier
+/// for consumers to deal with the structure of the operation
+union ExpressionImpl {
+  Literal,
+  FieldRef,
+  Call,
+  Case,
+  Cast,
+  Extract,
+  WindowCall,
+  AggregateCall,

Review comment:
       I think it would be useful to be able to serialize and send e.g. 
`sum($expr_0) / mean($expr_1)` (with these expressions being possibly unbound 
to a particular table schema) without having to build an aggregation relational 
operator — if aggregation function calls are "different" then the type system 
to achieve this is probably a bit more complex, if you have ideas

##########
File path: format/experimental/computeir/Literal.fbs
##########
@@ -0,0 +1,177 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+namespace org.apache.arrow.computeir.flatbuf;
+
+table ArrayLiteral {

Review comment:
       nit: ListLiteral ?

##########
File path: format/experimental/computeir/Expression.fbs
##########
@@ -0,0 +1,351 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+include "../../Schema.fbs";
+include "Literal.fbs";
+include "InlineBuffer.fbs";
+
+namespace org.apache.arrow.computeir.flatbuf;
+
+/// Access a value for a given map key
+table MapKey {
+  key: string (required);
+}
+
+/// Struct field access
+table StructField {
+  /// The position of the field in the struct schema
+  position: uint32;
+}
+
+/// Zero-based array index
+table ArraySubscript {
+  position: uint32;
+}
+
+/// Zero-based range of elements in an array
+table ArraySlice {
+  /// The start of an array slice, inclusive
+  start_inclusive: uint32;
+  /// The end of an array slice, exclusive
+  end_exclusive: uint32;
+}
+
+/// Field name in a relation
+table FieldName {
+  position: uint32;
+}
+
+/// A union of possible dereference operations
+union Deref {
+  /// Access a value for a given map key
+  MapKey,
+  /// Access the value at a struct field
+  StructField,
+  /// Access the element at a given index in an array
+  ArraySubscript,
+  /// Access a range of elements in an array
+  ArraySlice,
+  /// Access a field of a relation
+  FieldName,
+}
+
+/// Access the data of a field
+table FieldRef {
+  /// A sequence of field names to allow referencing potentially nested fields
+  ref: Deref (required);
+  /// For Expressions which might reference fields in multiple Relations,
+  /// this index may be provided to indicate which Relation's fields
+  /// `path` points into. For example in the case of a join,
+  /// 0 refers to the left relation and 1 to the right relation.
+  relation_index: int;
+}
+
+/// A canonical (probably SQL equivalent) function
+//
+// TODO: variadics
+enum CanonicalFunctionId : uint32 {
+  // logical
+  And,
+  Not,
+  Or,
+
+  // arithmetic
+  Add,
+  Subtract,
+  Multiply,
+  Divide,
+  Power,
+  AbsoluteValue,
+  Negate,
+  Sign,
+
+  // date/time/timestamp operations
+  DateSub,
+  DateAdd,
+  DateDiff,
+  TimeAdd,
+  TimeSub,
+  TimeDiff,
+  TimestampAdd,
+  TimestampSub,
+  TimestampDiff,
+
+  // comparison
+  Equals,
+  NotEquals,
+  Greater,
+  GreaterEqual,
+  Less,
+  LessEqual,
+}
+
+table CanonicalFunction {
+  id: CanonicalFunctionId;
+}
+
+table NonCanonicalFunction {
+  name_space: string;
+  name: string (required);
+}
+
+union FunctionImpl {
+  CanonicalFunction,
+  NonCanonicalFunction,
+}
+
+/// A function call expression
+table Call {
+  /// The kind of function call this is.
+  kind: FunctionImpl (required);
+
+  /// The arguments passed to `function_name`.
+  arguments: [Expression] (required);
+
+  /// Parameters for `function_name`; content/format may be unique to each
+  /// value of `function_name`.
+  metadata: InlineBuffer;
+}
+
+/// A single WHEN x THEN y fragment.
+table CaseFragment {
+  when: Expression (required);
+  then: Expression (required);
+}
+
+/// Case statement-style expression.
+table Case {
+  cases: [CaseFragment] (required);
+  /// The default value if no cases match. This is typically NULL in SQL
+  //implementations.
+  ///
+  /// Defaulting to NULL is a frontend choice, so producers must specify NULL
+  /// if that's their desired behavior.
+  default: Expression (required);
+
+  /// Parameters for `function_name`; content/format may be unique to each
+  /// value of `function_name`.
+  metadata: InlineBuffer;
+}
+
+table Cast {
+  /// The expression to cast
+  expression: Expression (required);
+
+  /// The type to cast `argument` to.
+  type: org.apache.arrow.flatbuf.Field (required);
+
+  /// Parameters for `function_name`; content/format may be unique to each
+  /// value of `function_name`.
+  metadata: InlineBuffer;
+}
+
+table Extract {
+  /// Expression from which to extract components.
+  expression: Expression (required);
+
+  /// Field to extract from `expression`.
+  field: string (required);
+
+  /// Parameters for `function_name`; content/format may be unique to each
+  /// value of `function_name`.
+  metadata: InlineBuffer;
+}
+
+/// Whether lesser values should precede greater or vice versa,
+/// also whether nulls should preced or follow values.
+enum Ordering : uint8 {
+  ASCENDING_THEN_NULLS,
+  DESCENDING_THEN_NULLS,
+  NULLS_THEN_ASCENDING,
+  NULLS_THEN_DESCENDING
+}
+
+/// An expression with an order
+table SortKey {
+  expression: Expression (required);
+  ordering: Ordering = ASCENDING_THEN_NULLS;
+}
+
+/// Boundary is unbounded
+table Unbounded {}
+
+union ConcreteBoundImpl {
+  Expression,
+  Unbounded,
+}
+
+/// Boundary is preceding rows, determined by the contained expression
+table Preceding {
+  ipml: ConcreteBoundImpl (required);
+}
+
+/// Boundary is following rows, determined by the contained expression
+table Following {
+  impl: ConcreteBoundImpl (required);
+}
+
+/// Boundary is the current row
+table CurrentRow {}
+
+union BoundImpl {
+  Preceding,
+  Following,
+  CurrentRow,
+}
+
+/// Boundary of a window
+table Bound {
+  impl: BoundImpl (required);
+}
+
+/// The kind of window function to be executed.
+enum Frame : uint8 {
+  Rows,
+  Range,
+}
+
+/// An expression representing a window function call.
+table WindowCall {
+  /// The kind of window frame
+  kind: Frame;
+  /// The expression to operate over
+  expression: Expression (required);
+  /// Partition keys
+  partitions: [Expression] (required);
+  /// Sort keys
+  orderings: [SortKey] (required);
+  /// Lower window bound
+  lower_bound: Bound (required);
+  /// Upper window bound
+  upper_bound: Bound (required);
+}
+
+/// A canonical (probably SQL equivalent) function
+enum CanonicalAggregateId : uint32 {
+  All,

Review comment:
       I agree that whatever decision is made with the scalar functions should 
be consistent here

##########
File path: format/experimental/computeir/Expression.fbs
##########
@@ -0,0 +1,351 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+include "../../Schema.fbs";
+include "Literal.fbs";
+include "InlineBuffer.fbs";
+
+namespace org.apache.arrow.computeir.flatbuf;
+
+/// Access a value for a given map key
+table MapKey {
+  key: string (required);
+}
+
+/// Struct field access
+table StructField {
+  /// The position of the field in the struct schema
+  position: uint32;
+}
+
+/// Zero-based array index
+table ArraySubscript {
+  position: uint32;
+}
+
+/// Zero-based range of elements in an array
+table ArraySlice {
+  /// The start of an array slice, inclusive
+  start_inclusive: uint32;
+  /// The end of an array slice, exclusive
+  end_exclusive: uint32;
+}
+
+/// Field name in a relation
+table FieldName {
+  position: uint32;
+}
+
+/// A union of possible dereference operations
+union Deref {
+  /// Access a value for a given map key
+  MapKey,
+  /// Access the value at a struct field
+  StructField,
+  /// Access the element at a given index in an array
+  ArraySubscript,
+  /// Access a range of elements in an array
+  ArraySlice,
+  /// Access a field of a relation
+  FieldName,
+}
+
+/// Access the data of a field
+table FieldRef {
+  /// A sequence of field names to allow referencing potentially nested fields
+  ref: Deref (required);
+  /// For Expressions which might reference fields in multiple Relations,
+  /// this index may be provided to indicate which Relation's fields
+  /// `path` points into. For example in the case of a join,
+  /// 0 refers to the left relation and 1 to the right relation.
+  relation_index: int;
+}
+
+/// A canonical (probably SQL equivalent) function
+//
+// TODO: variadics
+enum CanonicalFunctionId : uint32 {
+  // logical
+  And,
+  Not,
+  Or,
+
+  // arithmetic
+  Add,
+  Subtract,
+  Multiply,
+  Divide,
+  Power,
+  AbsoluteValue,
+  Negate,
+  Sign,
+
+  // date/time/timestamp operations
+  DateSub,
+  DateAdd,
+  DateDiff,
+  TimeAdd,
+  TimeSub,
+  TimeDiff,
+  TimestampAdd,
+  TimestampSub,
+  TimestampDiff,
+
+  // comparison
+  Equals,
+  NotEquals,
+  Greater,
+  GreaterEqual,
+  Less,
+  LessEqual,
+}
+
+table CanonicalFunction {
+  id: CanonicalFunctionId;
+}
+
+table NonCanonicalFunction {
+  name_space: string;
+  name: string (required);
+}
+
+union FunctionImpl {
+  CanonicalFunction,
+  NonCanonicalFunction,
+}
+
+/// A function call expression
+table Call {
+  /// The kind of function call this is.
+  kind: FunctionImpl (required);
+
+  /// The arguments passed to `function_name`.
+  arguments: [Expression] (required);
+
+  /// Parameters for `function_name`; content/format may be unique to each
+  /// value of `function_name`.
+  metadata: InlineBuffer;
+}
+
+/// A single WHEN x THEN y fragment.
+table CaseFragment {
+  when: Expression (required);
+  then: Expression (required);
+}
+
+/// Case statement-style expression.
+table Case {
+  cases: [CaseFragment] (required);
+  /// The default value if no cases match. This is typically NULL in SQL
+  //implementations.
+  ///
+  /// Defaulting to NULL is a frontend choice, so producers must specify NULL
+  /// if that's their desired behavior.
+  default: Expression (required);
+
+  /// Parameters for `function_name`; content/format may be unique to each
+  /// value of `function_name`.
+  metadata: InlineBuffer;
+}
+
+table Cast {
+  /// The expression to cast
+  expression: Expression (required);
+
+  /// The type to cast `argument` to.
+  type: org.apache.arrow.flatbuf.Field (required);
+
+  /// Parameters for `function_name`; content/format may be unique to each
+  /// value of `function_name`.
+  metadata: InlineBuffer;
+}
+
+table Extract {
+  /// Expression from which to extract components.
+  expression: Expression (required);
+
+  /// Field to extract from `expression`.
+  field: string (required);
+
+  /// Parameters for `function_name`; content/format may be unique to each
+  /// value of `function_name`.
+  metadata: InlineBuffer;
+}
+
+/// Whether lesser values should precede greater or vice versa,
+/// also whether nulls should preced or follow values.
+enum Ordering : uint8 {
+  ASCENDING_THEN_NULLS,
+  DESCENDING_THEN_NULLS,
+  NULLS_THEN_ASCENDING,
+  NULLS_THEN_DESCENDING
+}
+
+/// An expression with an order
+table SortKey {
+  expression: Expression (required);
+  ordering: Ordering = ASCENDING_THEN_NULLS;
+}
+
+/// Boundary is unbounded
+table Unbounded {}
+
+union ConcreteBoundImpl {
+  Expression,
+  Unbounded,
+}
+
+/// Boundary is preceding rows, determined by the contained expression
+table Preceding {
+  ipml: ConcreteBoundImpl (required);
+}
+
+/// Boundary is following rows, determined by the contained expression
+table Following {
+  impl: ConcreteBoundImpl (required);

Review comment:
       I agree where a table is being introduced solely to work around 
Flatbuffers' union issues, that calling it `Wrapper` would make it more clear. 
The rule then would be to never use a `ThingWrapper` as a member of any other 
table, only where you need to put the wrapped union in an array or serialize it 
as a top-level Flatbuffers object 

##########
File path: format/experimental/computeir/Literal.fbs
##########
@@ -0,0 +1,177 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+namespace org.apache.arrow.computeir.flatbuf;
+
+table ArrayLiteral {
+  values: [Literal];
+}
+
+table StructLiteral {
+  values: [KeyValue];
+}
+
+table KeyValue {
+  key: string;
+  value: Literal;
+}
+
+table MapLiteral {
+  values: [KeyValue];
+}
+
+table Int8Literal {
+  value: int8;
+}
+
+table Int16Literal {
+  value: int16;
+}
+
+table Int32Literal {
+  value: int32;
+}
+
+table Int64Literal {
+  value: int64;
+}
+
+table UInt8Literal {
+  value: uint8;
+}
+
+table UInt16Literal {
+  value: uint16;
+}
+
+table UInt32Literal {
+  value: uint32;
+}
+
+table UInt64Literal {
+  value: uint64;
+}
+
+table Float16Literal {
+  value: uint16;
+}
+
+table Float32Literal {
+  value: float32;
+}
+
+table Float64Literal {
+  value: float64;
+}
+
+table DecimalLiteral {
+  value: [uint8]; // 128 bit value.
+  scale: uint8;
+  precision: uint8;
+}
+
+table BooleanLiteral {
+  value: bool;
+}
+
+table NullLiteral {}
+
+table DateLiteral {
+  // milliseconds since epoch
+  value: int64;
+}
+
+table TimeLiteral {
+  // nanosecond time value.
+  value: int64;
+}
+
+table TimestampLiteral {
+  // microsecond timestamp.
+  value: int64;
+
+  // timezone value, same definition as used in Schema.fbs.
+  timezone: string;
+}
+
+table IntervalLiteralMonths {
+  months: int32;
+}
+
+table IntervalLiteralDaysMilliseconds {
+  days: int32;
+  milliseconds: int32;
+}
+
+union IntervalLiteralImpl {
+  IntervalLiteralMonths,
+  IntervalLiteralDaysMilliseconds,
+}
+
+table IntervalLiteral {
+  value: IntervalLiteralImpl;
+}
+
+table BinaryLiteral {
+  value: [byte];
+}
+
+table StringLiteral {
+  value: string;
+}
+
+// no union literal is defined as only one branch of a union can be resolved.
+// no literals for large string/binary types as flatbuffer is limited to 2gb.
+
+union LiteralImpl {
+  NullLiteral,
+  BooleanLiteral,
+
+  Int8Literal,
+  Int16Literal,
+  Int32Literal,
+  Int64Literal,
+
+  UInt8Literal,
+  UInt16Literal,
+  UInt32Literal,
+  UInt64Literal,
+
+  DateLiteral,
+  TimeLiteral,
+  TimestampLiteral,
+  IntervalLiteral,

Review comment:
       nit: need Duration

##########
File path: format/experimental/computeir/Expression.fbs
##########
@@ -0,0 +1,351 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+include "../../Schema.fbs";
+include "Literal.fbs";
+include "InlineBuffer.fbs";
+
+namespace org.apache.arrow.computeir.flatbuf;
+
+/// Access a value for a given map key
+table MapKey {
+  key: string (required);
+}
+
+/// Struct field access
+table StructField {
+  /// The position of the field in the struct schema
+  position: uint32;
+}
+
+/// Zero-based array index
+table ArraySubscript {
+  position: uint32;
+}
+
+/// Zero-based range of elements in an array
+table ArraySlice {
+  /// The start of an array slice, inclusive
+  start_inclusive: uint32;
+  /// The end of an array slice, exclusive
+  end_exclusive: uint32;
+}
+
+/// Field name in a relation
+table FieldName {
+  position: uint32;
+}
+
+/// A union of possible dereference operations
+union Deref {
+  /// Access a value for a given map key
+  MapKey,
+  /// Access the value at a struct field
+  StructField,
+  /// Access the element at a given index in an array
+  ArraySubscript,
+  /// Access a range of elements in an array
+  ArraySlice,
+  /// Access a field of a relation
+  FieldName,
+}
+
+/// Access the data of a field
+table FieldRef {
+  /// A sequence of field names to allow referencing potentially nested fields
+  ref: Deref (required);
+  /// For Expressions which might reference fields in multiple Relations,
+  /// this index may be provided to indicate which Relation's fields
+  /// `path` points into. For example in the case of a join,
+  /// 0 refers to the left relation and 1 to the right relation.
+  relation_index: int;
+}
+
+/// A canonical (probably SQL equivalent) function
+//
+// TODO: variadics
+enum CanonicalFunctionId : uint32 {
+  // logical
+  And,
+  Not,
+  Or,
+
+  // arithmetic
+  Add,
+  Subtract,
+  Multiply,
+  Divide,
+  Power,
+  AbsoluteValue,
+  Negate,
+  Sign,
+
+  // date/time/timestamp operations
+  DateSub,
+  DateAdd,
+  DateDiff,
+  TimeAdd,
+  TimeSub,
+  TimeDiff,
+  TimestampAdd,
+  TimestampSub,
+  TimestampDiff,
+
+  // comparison
+  Equals,
+  NotEquals,
+  Greater,
+  GreaterEqual,
+  Less,
+  LessEqual,
+}
+
+table CanonicalFunction {
+  id: CanonicalFunctionId;
+}
+
+table NonCanonicalFunction {
+  name_space: string;
+  name: string (required);
+}
+
+union FunctionImpl {
+  CanonicalFunction,
+  NonCanonicalFunction,
+}
+
+/// A function call expression
+table Call {
+  /// The kind of function call this is.
+  kind: FunctionImpl (required);
+
+  /// The arguments passed to `function_name`.
+  arguments: [Expression] (required);
+
+  /// Parameters for `function_name`; content/format may be unique to each
+  /// value of `function_name`.
+  metadata: InlineBuffer;
+}
+
+/// A single WHEN x THEN y fragment.
+table CaseFragment {
+  when: Expression (required);
+  then: Expression (required);
+}
+
+/// Case statement-style expression.
+table Case {
+  cases: [CaseFragment] (required);
+  /// The default value if no cases match. This is typically NULL in SQL
+  //implementations.
+  ///
+  /// Defaulting to NULL is a frontend choice, so producers must specify NULL
+  /// if that's their desired behavior.
+  default: Expression (required);
+
+  /// Parameters for `function_name`; content/format may be unique to each
+  /// value of `function_name`.
+  metadata: InlineBuffer;
+}
+
+table Cast {
+  /// The expression to cast
+  expression: Expression (required);
+
+  /// The type to cast `argument` to.
+  type: org.apache.arrow.flatbuf.Field (required);
+
+  /// Parameters for `function_name`; content/format may be unique to each
+  /// value of `function_name`.
+  metadata: InlineBuffer;
+}
+
+table Extract {
+  /// Expression from which to extract components.
+  expression: Expression (required);
+
+  /// Field to extract from `expression`.
+  field: string (required);
+
+  /// Parameters for `function_name`; content/format may be unique to each
+  /// value of `function_name`.
+  metadata: InlineBuffer;
+}
+
+/// Whether lesser values should precede greater or vice versa,
+/// also whether nulls should preced or follow values.
+enum Ordering : uint8 {
+  ASCENDING_THEN_NULLS,
+  DESCENDING_THEN_NULLS,
+  NULLS_THEN_ASCENDING,
+  NULLS_THEN_DESCENDING
+}
+
+/// An expression with an order
+table SortKey {
+  expression: Expression (required);
+  ordering: Ordering = ASCENDING_THEN_NULLS;
+}
+
+/// Boundary is unbounded
+table Unbounded {}
+
+union ConcreteBoundImpl {
+  Expression,
+  Unbounded,
+}
+
+/// Boundary is preceding rows, determined by the contained expression
+table Preceding {
+  ipml: ConcreteBoundImpl (required);
+}
+
+/// Boundary is following rows, determined by the contained expression
+table Following {
+  impl: ConcreteBoundImpl (required);
+}
+
+/// Boundary is the current row
+table CurrentRow {}
+
+union BoundImpl {
+  Preceding,
+  Following,
+  CurrentRow,
+}
+
+/// Boundary of a window
+table Bound {
+  impl: BoundImpl (required);
+}
+
+/// The kind of window function to be executed.
+enum Frame : uint8 {
+  Rows,
+  Range,
+}
+
+/// An expression representing a window function call.
+table WindowCall {
+  /// The kind of window frame
+  kind: Frame;
+  /// The expression to operate over
+  expression: Expression (required);
+  /// Partition keys
+  partitions: [Expression] (required);
+  /// Sort keys
+  orderings: [SortKey] (required);
+  /// Lower window bound
+  lower_bound: Bound (required);
+  /// Upper window bound
+  upper_bound: Bound (required);
+}
+
+/// A canonical (probably SQL equivalent) function
+enum CanonicalAggregateId : uint32 {
+  All,
+  Any,
+  Count,
+  CountTable,
+  Mean,
+  Min,
+  Max,
+  Product,
+  Sum,
+  Variance,
+  StandardDev,
+}
+
+
+table CanonicalAggregate {
+  id: CanonicalAggregateId;
+}
+
+table NonCanonicalAggregate {
+  name_space: string;
+  name: string (required);
+}
+
+union AggregateImpl {
+  CanonicalAggregate,
+  NonCanonicalAggregate,
+}
+
+table AggregateCall {
+  /// The kind of aggregate function being executed
+  kind: AggregateImpl (required);
+
+  /// Aggregate expression arguments
+  arguments: [Expression] (required);
+
+  /// Possible ordering.
+  orderings: [SortKey];
+
+  /// optional per-aggregate filtering
+  predicate: Expression;
+}
+
+/// An expression is one of
+/// - a Literal datum
+/// - a reference to a field from a Relation
+/// - a call to a named function
+/// - a case expression
+/// - a cast expression
+/// - an extract operation
+/// - a window function call
+/// - an aggregate function call
+///
+/// The expressions here that look like function calls such as
+/// Cast,Case and Extract are special in that while they might
+/// fit into a Call, they don't cleanly do so without having
+/// to pass around non-expression arguments as metadata.
+///
+/// AggregateCall and WindowCall are also separate variants
+/// due to special options for each that don't apply to generic
+/// function calls. Again this is done to make it easier
+/// for consumers to deal with the structure of the operation
+union ExpressionImpl {
+  Literal,
+  FieldRef,
+  Call,
+  Case,
+  Cast,
+  Extract,
+  WindowCall,
+  AggregateCall,
+}
+
+/// Expression types
+///
+/// Expressions have a concrete `impl` value, which is a specific operation
+/// They also have a `type` field, which is the output type of the expression,
+/// regardless of operation type.
+///
+/// The only exception so far is Cast, which has a type as input argument, 
which
+/// is equal to output type.
+table Expression {
+  impl: ExpressionImpl (required);
+
+  /// The type of the expression.
+  ///
+  /// This is a field, because the Type union in Schema.fbs
+  /// isn't self-contained: Fields are necessary to describe complex types
+  /// and there's currently no reason to optimize the storage of this.
+  type: org.apache.arrow.flatbuf.Field;

Review comment:
       I think this is an important question — connected to the above question 
about enumerating function signatures — and should be raised and discussed more 
broadly on the mailing list and try to include the larger group of people who 
commented in the Google Doc that I made originally. 
   
   For built-in / canonical functions where we expect `$func($type_0, ...)` to 
yield a deterministic output type, there isn't much motivation to serialize the 
output type — you would only want to put the output type there when it is 
adding useful information. If you always put it there, you're paying the cost 
of serializing and deserializing a Field for every expression. An IR producer 
could run in "verbose" mode and put all the output types (if it were useful to 
a IR consumer that doesn't have the type derivation logic)
   
   I suspect that there will be a need to build an inventory of function 
signatures to reduce ambiguity and to make things more straightforward for IR 
implementations (for example, an implementation could read the input/output 
type rules from a text file — or generate code — rather than having to enter 
the type derivations by hand)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to