jvanstraten commented on code in PR #12852:
URL: https://github.com/apache/arrow/pull/12852#discussion_r856573840
##########
cpp/src/arrow/engine/substrait/plan_internal.cc:
##########
@@ -108,13 +108,23 @@ void SetElement(size_t i, const Element& element,
std::vector<T>* vector) {
}
(*vector)[i] = static_cast<T>(element);
}
+
+template <typename Element, typename key, typename value>
+void SetMapElement(key i, const Element& element, std::unordered_map<key,
value>* map) {
+ DCHECK_LE(i, 1 << 20);
+ if (i >= map->size()) {
+ map->reserve(i + 1);
+ }
+ (*map)[i] = static_cast<value>(element);
+}
+
} // namespace
Result<ExtensionSet> GetExtensionSetFromPlan(const substrait::Plan& plan,
ExtensionIdRegistry* registry) {
- std::vector<util::string_view> uris;
+ std::unordered_map<uint32_t, util::string_view> uris;
for (const auto& uri : plan.extension_uris()) {
- SetElement(uri.extension_uri_anchor(), uri.uri(), &uris);
+ SetMapElement(uri.extension_uri_anchor(), uri.uri(), &uris);
}
// NOTE: it's acceptable to use views to memory owned by plan;
ExtensionSet::Make
Review Comment:
I'm inclined to agree, because personally I still don't understand why Arrow
should have producer code at all. The best argument I've heard yet is to
communicate plans across language boundaries within Arrow, but it'd be so much
easier to just make protobuf messages that actually correspond to the execution
engine data structures and type system for that. Heck, throw out protobuf in
favor of something that links sanely. The second best is roundtrip testing, but
is all the code needed to not only produce but also bit-by-bit *reproduce* the
same plan really worth a factor 2 to 3 in code size?
But anyway, what you're describing as a non-goal for those functions is
exactly how they are currently used. In practice the anchors will all already
exist, since the producer code is AFAIK incomplete and can only reproduce an
incoming plan, so the new-anchor-generation code would never trigger under
normal circumstances. IMO that's no excuse for that function to just silently
override the same anchor over and over again in that case, though. It should at
least detect it and throw an error.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]