jacques-n commented on a change in pull request #10934:
URL: https://github.com/apache/arrow/pull/10934#discussion_r697944747



##########
File path: format/experimental/computeir/Expression.fbs
##########
@@ -0,0 +1,351 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+include "../../Schema.fbs";
+include "Literal.fbs";
+include "InlineBuffer.fbs";
+
+namespace org.apache.arrow.computeir.flatbuf;
+
+/// Access a value for a given map key
+table MapKey {
+  key: string (required);
+}
+
+/// Struct field access
+table StructField {
+  /// The position of the field in the struct schema
+  position: uint32;
+}
+
+/// Zero-based array index
+table ArraySubscript {
+  position: uint32;
+}
+
+/// Zero-based range of elements in an array
+table ArraySlice {
+  /// The start of an array slice, inclusive
+  start_inclusive: uint32;
+  /// The end of an array slice, exclusive
+  end_exclusive: uint32;
+}
+
+/// Field name in a relation
+table FieldName {
+  position: uint32;
+}
+
+/// A union of possible dereference operations
+union Deref {
+  /// Access a value for a given map key
+  MapKey,
+  /// Access the value at a struct field
+  StructField,
+  /// Access the element at a given index in an array
+  ArraySubscript,
+  /// Access a range of elements in an array
+  ArraySlice,
+  /// Access a field of a relation
+  FieldName,
+}
+
+/// Access the data of a field
+table FieldRef {
+  /// A sequence of field names to allow referencing potentially nested fields
+  ref: Deref (required);
+  /// For Expressions which might reference fields in multiple Relations,
+  /// this index may be provided to indicate which Relation's fields
+  /// `path` points into. For example in the case of a join,
+  /// 0 refers to the left relation and 1 to the right relation.
+  relation_index: int;
+}
+
+/// A canonical (probably SQL equivalent) function
+//
+// TODO: variadics
+enum CanonicalFunctionId : uint32 {

Review comment:
       >> But that is a massive scope increase for this project to 
   
   I guess I don't really see this as a massive increase in scope.  Maybe it 
just about being more formal sooner? Without being formal about this I fear 
that this will become a singly used representation and the implementation will 
define the specification, rather than the other way around.
   
   >> ...enumerate all the function signatures of every function overload 
contemplated...
   Part of it to me is also that this will be a growing list. It doesn't have 
to start with all possible functions, only the functions we want to initially 
specify. I would expect adding new functions would be relatively 
straightforward.
   
   >> I don't follow why operations with multiple overloads need to be dealt 
with at all in the IR. Wouldn't a function have a singular definition (or be 
singularly derivable) for a given IR?
   From an engine implementation point of view, as an example, I feel like 
decimal division has much more in common with decimal multiplication than it 
does with integer division. It also have a very different type output 
resolution system. Overloading a single concept of division to state different 
output type derivation systems seems quite a bit more complex than simply 
saying that there is no "overloading". To me, we simply need to consider each 
function's key to be the name + the input argument types. Sure, two functions 
may have the same "name" but that doesn't mean they have the same key (or have 
anything to do with each other).
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to