rtpsw commented on code in PR #13375:
URL: https://github.com/apache/arrow/pull/13375#discussion_r907730758


##########
cpp/src/arrow/engine/substrait/serde.h:
##########
@@ -40,22 +41,81 @@ using ConsumerFactory = 
std::function<std::shared_ptr<compute::SinkNodeConsumer>
 
 /// \brief Deserializes a Substrait Plan message to a list of ExecNode 
declarations
 ///
+/// The output of each top-level Substrait relation will be sent to a caller 
supplied
+/// consumer function provided by consumer_factory
+///
 /// \param[in] buf a buffer containing the protobuf serialization of a 
Substrait Plan
 /// message
 /// \param[in] consumer_factory factory function for generating the node that 
consumes
 /// the batches produced by each toplevel Substrait relation
+/// \param[in] registry an extension-id-registry to use, or null for the 
default one.
 /// \param[out] ext_set_out if non-null, the extension mapping used by the 
Substrait
 /// Plan is returned here.
 /// \return a vector of ExecNode declarations, one for each toplevel relation 
in the
 /// Substrait Plan
 ARROW_ENGINE_EXPORT Result<std::vector<compute::Declaration>> DeserializePlans(
     const Buffer& buf, const ConsumerFactory& consumer_factory,
-    ExtensionSet* ext_set_out = NULLPTR);
+    const ExtensionIdRegistry* registry = NULLPTR, ExtensionSet* ext_set_out = 
NULLPTR);
 
+/// \brief Deserializes a single-relation Substrait Plan message to an 
execution plan
+///
+/// The output of each top-level Substrait relation will be sent to a caller 
supplied
+/// consumer function provided by consumer_factory
+///
+/// \param[in] buf a buffer containing the protobuf serialization of a 
Substrait Plan
+/// message
+/// \param[in] consumer_factory factory function for generating the node that 
consumes
+/// the batches produced by each toplevel Substrait relation
+/// \param[in] registry an extension-id-registry to use, or null for the 
default one.
+/// \param[out] ext_set_out if non-null, the extension mapping used by the 
Substrait
+/// Plan is returned here.
+/// \return an ExecNode corresponding to the single toplevel relation in the 
Substrait
+/// Plan
 Result<compute::ExecPlan> DeserializePlan(const Buffer& buf,
                                           const ConsumerFactory& 
consumer_factory,
+                                          const ExtensionIdRegistry* registry 
= NULLPTR,
                                           ExtensionSet* ext_set_out = NULLPTR);
 
+/// Factory function type for generating the write options of a node consuming 
the batches
+/// produced by each toplevel Substrait relation when deserializing a 
Substrait Plan.
+using WriteOptionsFactory = 
std::function<std::shared_ptr<dataset::WriteNodeOptions>()>;
+
+/// \brief Deserializes a Substrait Plan message to a list of ExecNode 
declarations
+///
+/// The output of each top-level Substrait relation will be written to a 
filesystem.
+/// `write_options_factory` can be used to control write behavior.
+///
+/// \param[in] buf a buffer containing the protobuf serialization of a 
Substrait Plan
+/// message
+/// \param[in] write_options_factory factory function for generating the write 
options of
+/// a node consuming the batches produced by each toplevel Substrait relation
+/// \param[in] registry an extension-id-registry to use, or null for the 
default one.
+/// \param[out] ext_set_out if non-null, the extension mapping used by the 
Substrait
+/// Plan is returned here.
+/// \return a vector of ExecNode declarations, one for each toplevel relation 
in the
+/// Substrait Plan
+ARROW_ENGINE_EXPORT Result<std::vector<compute::Declaration>> DeserializePlans(
+    const Buffer& buf, const WriteOptionsFactory& write_options_factory,
+    const ExtensionIdRegistry* registry = NULLPTR, ExtensionSet* ext_set_out = 
NULLPTR);
+
+/// \brief Deserializes a single-relation Substrait Plan message to an 
execution plan
+///
+/// The output of the single Substrait relation will be written to a 
filesystem.
+/// `write_options_factory` can be used to control write behavior.
+///
+/// \param[in] buf a buffer containing the protobuf serialization of a 
Substrait Plan
+/// message
+/// \param[in] write_options_factory factory function for generating the write 
options of
+/// a node consuming the batches produced by each toplevel Substrait relation
+/// \param[in] registry an extension-id-registry to use, or null for the 
default one.
+/// \param[out] ext_set_out if non-null, the extension mapping used by the 
Substrait
+/// Plan is returned here.
+/// \return a vector of ExecNode declarations, one for each toplevel relation 
in the
+/// Substrait Plan
+ARROW_ENGINE_EXPORT Result<compute::ExecPlan> DeserializePlan(
+    const Buffer& buf, const WriteOptionsFactory& write_options_factory,
+    const ExtensionIdRegistry* registry = NULLPTR, ExtensionSet* ext_set_out = 
NULLPTR);

Review Comment:
   I think changing from factories to instances would lead to trouble. 
`ConsumerFactory` (and `WriteOptionsFactory` too) is called once per `PlanRel` 
in the Substrait plan in order to get a distinct instance per invocation, 
despite the factory taking no arguments, which often leads to returning equal 
instances. However, using an instance instead of a factory would pretty much 
force reusing the same instance with each `PlanRel`, which isn't right. To 
avoid reusing such an instance, it would have to support safe duplication, 
which would be more annoying for the caller to implement than a factory.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to