westonpace commented on code in PR #33623:
URL: https://github.com/apache/arrow/pull/33623#discussion_r1068405976
##########
cpp/src/arrow/engine/substrait/util.h:
##########
@@ -38,10 +38,24 @@ namespace engine {
using PythonTableProvider =
std::function<Result<std::shared_ptr<Table>>(const
std::vector<std::string>&)>;
+/// \brief Utility method to run a Substrait plan
+/// \param substrait_buffer The plan to run, must be in binary protobuf format
+/// \param registry A registry of extension functions to make available to the
plan
+/// If null then the default registry will be used.
+/// \param memory_pool The memory pool the plan should use to make allocations.
+/// \param func_registry A registry of functions used for execution
expressions.
+/// `registry` maps from Substrait function IDs to
"names" These
+/// names will be provided to `func_registry` to get the
actual
+/// kernel.
+/// \param conversion_options Options to control plan deserialization
+/// \param use_threads If True then the CPU thread pool will be used for CPU
work. If
+/// False then all work will be done on the calling thread.
+/// \return A record batch reader that will read out the results
ARROW_ENGINE_EXPORT Result<std::shared_ptr<RecordBatchReader>>
ExecuteSerializedPlan(
const Buffer& substrait_buffer, const ExtensionIdRegistry* registry =
NULLPTR,
compute::FunctionRegistry* func_registry = NULLPTR,
- const ConversionOptions& conversion_options = {});
+ const ConversionOptions& conversion_options = {}, bool use_threads = true,
Review Comment:
The default is generally to maximize performance at whatever expense to CPU
& RAM. I think this is ok. Users usually want things to run as quickly as
possible.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]