westonpace commented on code in PR #34627:
URL: https://github.com/apache/arrow/pull/34627#discussion_r1145005055


##########
cpp/src/arrow/engine/substrait/options.cc:
##########
@@ -166,6 +171,57 @@ class DefaultExtensionProvider : public 
BaseExtensionProvider {
                                      named_tap_rel.name(), 
std::move(renamed_schema)));
     return RelationInfo{{std::move(decl), std::move(renamed_schema)}, 
std::nullopt};
   }
+
+  Result<RelationInfo> MakeSegmentedAggregateRel(
+      const ConversionOptions& conv_opts, const std::vector<DeclarationInfo>& 
inputs,
+      const substrait_ext::SegmentedAggregateRel& seg_agg_rel,
+      const ExtensionSet& ext_set) {
+    if (inputs.size() != 1) {
+      return Status::Invalid(
+          "substrait_ext::SegmentedAggregateRel requires a single input but 
got: ",
+          inputs.size());
+    }
+
+    auto input_schema = inputs[0].output_schema;
+
+    ConversionOptions conversion_options;
+
+    // store segment key fields to be used when output schema is created
+    std::vector<int> segment_key_field_ids;
+    std::vector<FieldRef> segment_keys;
+    if (seg_agg_rel.segment_groupings_size() > 0) {
+      ARROW_RETURN_NOT_OK(internal::ParseAggregateGrouping(
+          seg_agg_rel.segment_groupings(0), ext_set, conversion_options, 
input_schema,
+          &segment_key_field_ids, &segment_keys));
+    }
+
+    const auto& aggregate = seg_agg_rel.aggregate();
+    ARROW_ASSIGN_OR_RAISE(
+        auto decl_info,
+        internal::ParseAggregateDeclaration(

Review Comment:
   In Substrait itself we have been discouraging this kind of approach when 
creating physical relations because:
   
    * It's too expressive - We don't consume all kinds of AggregateRel (e.g. 
expressions in a grouping have to be direct references) and, since this is a 
physical relation, we should only expose what we can consume.
    * Unnecessary coupling - It's not possible to change AggregateRel without 
potentially changing all the extensions and it's not clear they would always 
need to change.
    * Directly including the Rel itself leads to some awkwardness like the fact 
that you now have multiple "inputs".
   
   However, yes, most of the parsing code would then have to be duplicated.  So 
for something internal I don't know that it is unworkable.  So I don't have a 
strong opinion but I would lean slightly towards something like...
   
   ```
   message SegmentedAggregateRel {
     repeated Expression.FieldReference grouping_keys = 0;
     repeated Expression.FieldReference segment_keys = 1;
     repeated substrait.AggregateRel.Measure measures = 2;
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to