westonpace commented on code in PR #33909:
URL: https://github.com/apache/arrow/pull/33909#discussion_r1096045166


##########
cpp/src/arrow/engine/substrait/options.h:
##########
@@ -83,6 +84,14 @@ class ARROW_ENGINE_EXPORT ExtensionProvider {
 
 ARROW_ENGINE_EXPORT std::shared_ptr<ExtensionProvider> 
default_extension_provider();
 
+struct ARROW_ENGINE_EXPORT NamedTapNodeOptions : public 
compute::ExecNodeOptions {
+  NamedTapNodeOptions(const std::string& name, std::shared_ptr<Schema> schema)
+      : name(name), schema(std::move(schema)) {}
+
+  std::string name;
+  std::shared_ptr<Schema> schema;
+};

Review Comment:
   I agree with @rtpsw that going too generic (protobuf.Any in and Declaration 
out) is not a good idea.  This is the exact API for the `ExtensionProvider` and 
so we already have this.  You could use this for `WriteSmoothNodeOption` and it 
would end up looking just like the extension that is in place for AsofJoinNode. 
 If that works for you then there is nothing we need to do :)
   
   So if we want to go less generic then we should look at what Acero provides 
at each step to simplify things:
   
   ### Generic Node
   
   The next step down in generality would be to remove protobuf from the 
picture using Substrait/Arrow literals:
   
   ```
   message GenericOptions {
     message Property {
       string key = 0;
       Literal value = 1;
     }
     repeated Property properties = 1;
   }
   
   message GenericNode {
     string kind = 0;
     GenericOptions options = 1;
   }
   ```
   
   The API in Arrow would be:
   
   ```
   void RegisterGenericExtensionHandler(string kind, 
function<Declaration(unordered_map<string, Scalar>)> handler)
   ```
   
   In other words, given properties as a map of arrow scalars, create a 
declaration
   
   **Pros:**
     * No need for extension author to write proto file
     * Can be used for any extension type
    
   **Cons:**
     * Need to create an entire ExecNode for your custom behavior
   
   ### Generic Mapper
   
   An even more focused approach could be to remove the need to create an exec 
node at all.  This would be focused on the 1 input in / 1 input out / no 
accumulation case (e.g. what MapNode handles).  From what I know of your write 
need this would fit this case.
   
   The Substrait would be pretty much identical to the above.
   
   ```
   message MapNode {
     string kind = 0;
     GenericOptions options = 1;
   }
   ```
   
   However, the Arrow API would be:
   
   ```
   using Mapper = function<ExecBatch(ExecBatch)>;
   void RegisterGenericMapHandler(string kind, 
function<Mapper(unordered_map<string, Scalar>)> handler)
   ```
   
   **Pros:**
    * No need to write proto file
    * No need to create an exec node
   
   **Cons:**
    * Only handles 1-in relations that don't accumulate
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to