westonpace commented on code in PR #33909:
URL: https://github.com/apache/arrow/pull/33909#discussion_r1090884638
##########
cpp/src/arrow/engine/substrait/options.cc:
##########
@@ -28,6 +28,19 @@
namespace arrow {
namespace engine {
+namespace {
+
+std::vector<compute::Declaration::Input> MakeInputDeclarations(
Review Comment:
Maybe call this `MakeDeclarationInputs`? An "input declaration" sounds like
a source node.
##########
cpp/src/arrow/engine/substrait/options.cc:
##########
@@ -28,6 +28,19 @@
namespace arrow {
namespace engine {
+namespace {
+
+std::vector<compute::Declaration::Input> MakeInputDeclarations(
+ const std::vector<DeclarationInfo> inputs) {
Review Comment:
```suggestion
const std::vector<DeclarationInfo>& inputs) {
```
##########
cpp/proto/substrait/extension_rels.proto:
##########
@@ -44,3 +44,9 @@ message AsOfJoinRel {
repeated .substrait.Expression by = 2;
}
}
+
+message NamedTapRel {
+ string kind = 1;
+ string name = 2;
+ repeated string columns = 3;
+}
Review Comment:
Let's add some comments explaining what these are perhaps.
##########
cpp/src/arrow/engine/substrait/options.cc:
##########
@@ -111,15 +129,43 @@ class DefaultExtensionProvider : public
BaseExtensionProvider {
compute::AsofJoinNodeOptions asofjoin_node_opts{std::move(input_keys),
tolerance};
// declaration
- std::vector<compute::Declaration::Input> input_decls(inputs.size());
- for (size_t i = 0; i < inputs.size(); i++) {
- input_decls[i] = inputs[i].declaration;
- }
+ auto input_decls = MakeInputDeclarations(inputs);
return RelationInfo{
{compute::Declaration("asofjoin", input_decls,
std::move(asofjoin_node_opts)),
std::move(schema)},
std::move(field_output_indices)};
}
+
+ Result<RelationInfo> MakeNamedTapRel(const std::vector<DeclarationInfo>&
inputs,
+ const substrait_ext::NamedTapRel&
named_tap_rel,
+ const ExtensionSet& ext_set) {
+ if (inputs.size() != 1) {
+ return Status::Invalid(
+ "substrait_ext::NamedTapNode requires a single table but got: ",
inputs.size());
+ }
+
+ auto schema = inputs[0].output_schema;
+ int num_fields = schema->num_fields();
+ if (named_tap_rel.columns_size() != num_fields) {
+ return Status::Invalid("Got ", named_tap_rel.columns_size(),
+ " NamedTapRel columns but expected ", num_fields);
+ }
+ int i = 0;
+ FieldVector fields(static_cast<size_t>(num_fields));
+ for (const auto& column : named_tap_rel.columns()) {
+ fields[i] = field(column, schema->field(i)->type());
+ i++;
+ }
+ auto renamed_schema = arrow::schema(std::move(fields));
Review Comment:
There is a `Field::WithName` method. You could maybe move this into a
`Schema::WithNames` method. I wrote one once but it was on a PR that didn't
end up merging:
```
Result<std::shared_ptr<Schema>> Schema::WithNames(
const std::vector<std::string>& names) const {
if (names.size() != impl_->fields_.size()) {
return Status::Invalid("attempted to rename schema with ",
impl_->fields_.size(),
" fields but only ", names.size(), " new names
were given");
}
FieldVector new_fields;
auto names_itr = names.begin();
for (const auto& field : impl_->fields_) {
new_fields.push_back(field->WithName(*names_itr));
names_itr++;
}
return schema(std::move(new_fields));
}
```
##########
cpp/src/arrow/engine/substrait/options.h:
##########
@@ -83,6 +84,14 @@ class ARROW_ENGINE_EXPORT ExtensionProvider {
ARROW_ENGINE_EXPORT std::shared_ptr<ExtensionProvider>
default_extension_provider();
+struct ARROW_ENGINE_EXPORT NamedTapNodeOptions : public
compute::ExecNodeOptions {
+ NamedTapNodeOptions(const std::string& name, std::shared_ptr<Schema> schema)
+ : name(name), schema(std::move(schema)) {}
+
+ std::string name;
+ std::shared_ptr<Schema> schema;
+};
Review Comment:
I'm not sure what the plan is here. I thought originally the goal was to
mimic the named table. In that case I'd expect to see something like
`NamedTableProvider` (e.g. `NamedTapProvider`) that translates names into
actual nodes (e.g. a `TeeNode`).
In that case I wouldn't expect there to be any "named tap" equivalent in
Acero. I'm not sure what `NamedTapNode` and `AddPassFactory` are providing in
this PR?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]