westonpace commented on code in PR #33909:
URL: https://github.com/apache/arrow/pull/33909#discussion_r1101863001
##########
cpp/src/arrow/engine/substrait/options.h:
##########
@@ -93,6 +114,10 @@ struct ARROW_ENGINE_EXPORT ConversionOptions {
/// The default behavior will return an invalid status if the plan has any
/// named table relations.
NamedTableProvider named_table_provider = kDefaultNamedTableProvider;
+ /// \brief A custom strategy to be used for mapping a tap kind to a function
name
+ ///
+ /// The default mapper returns a declaration whose factory name is equal to
the tap kind
+ NamedTapKindMapper named_tap_mapper = kDefaultNamedTapKindMapper;
Review Comment:
How do you feel about renaming this to `NamedTapProvider` for consistency?
##########
cpp/src/arrow/engine/substrait/options.h:
##########
@@ -76,13 +88,22 @@ class ARROW_ENGINE_EXPORT ExtensionProvider {
public:
static std::shared_ptr<ExtensionProvider> kDefaultExtensionProvider;
virtual ~ExtensionProvider() = default;
- virtual Result<RelationInfo> MakeRel(const std::vector<DeclarationInfo>&
inputs,
+ virtual Result<RelationInfo> MakeRel(const ConversionOptions& conv_opts,
+ const std::vector<DeclarationInfo>&
inputs,
const ExtensionDetails& ext_details,
const ExtensionSet& ext_set) = 0;
};
ARROW_ENGINE_EXPORT std::shared_ptr<ExtensionProvider>
default_extension_provider();
+struct ARROW_ENGINE_EXPORT NamedTapNodeOptions : public
compute::ExecNodeOptions {
Review Comment:
We can pass the schema and name directly instead of using this class. If we
still want to wrap the two of them in some kind of object (to reduce the # of
args passed into the mapper?) then we should include `kind`, do not extend
`ExecNodeOptions`, and call it something like NamedTap or NamedTapOptions.
In other words, we shouldn't try to form the node options from the protobuf.
That will be the mapper's job. We just need to pass whatever was in the
protobuf along to the mapper and let the mapper figure out what node options
are appropriate.
##########
cpp/src/arrow/engine/substrait/options.h:
##########
@@ -67,6 +69,16 @@ using NamedTableProvider =
std::function<Result<compute::Declaration>(const
std::vector<std::string>&)>;
static NamedTableProvider kDefaultNamedTableProvider;
+using NamedTapKindMapper = std::function<Result<compute::Declaration>(
+ const std::string&, std::vector<compute::Declaration::Input>,
+ std::shared_ptr<compute::ExecNodeOptions>)>;
Review Comment:
This looks great. I think we can do one more small tweak.
```suggestion
using NamedTapKindMapper = std::function<Result<compute::Declaration>(
const std::string&, std::vector<compute::Declaration::Input>,
std::shared_ptr<Schema>, std::string name)>;
```
##########
cpp/src/arrow/engine/substrait/options.h:
##########
@@ -67,6 +69,16 @@ using NamedTableProvider =
std::function<Result<compute::Declaration>(const
std::vector<std::string>&)>;
static NamedTableProvider kDefaultNamedTableProvider;
+using NamedTapKindMapper = std::function<Result<compute::Declaration>(
+ const std::string&, std::vector<compute::Declaration::Input>,
+ std::shared_ptr<compute::ExecNodeOptions>)>;
+static NamedTapKindMapper kDefaultNamedTapKindMapper =
Review Comment:
Do you think we will want to make the default named tap mapper (or the
default named table provider for that matter) configurable?
##########
cpp/src/arrow/engine/substrait/options.h:
##########
@@ -67,6 +69,16 @@ using NamedTableProvider =
std::function<Result<compute::Declaration>(const
std::vector<std::string>&)>;
static NamedTableProvider kDefaultNamedTableProvider;
+using NamedTapKindMapper = std::function<Result<compute::Declaration>(
+ const std::string&, std::vector<compute::Declaration::Input>,
+ std::shared_ptr<compute::ExecNodeOptions>)>;
+static NamedTapKindMapper kDefaultNamedTapKindMapper =
+ [](const std::string& kind, std::vector<compute::Declaration::Input>
inputs,
+ std::shared_ptr<compute::ExecNodeOptions> options)
+ -> Result<compute::Declaration> {
+ return compute::Declaration(kind, inputs, options);
Review Comment:
If we remove `NamedTapNodeOptions` like some of my other comments suggest
then I suppose there is no more meaningful default mapper. That being said, I
don't know of any use case where this default would be correct. Perhaps we
just return an error here like we do with named tables?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]