jvanstraten commented on code in PR #12852:
URL: https://github.com/apache/arrow/pull/12852#discussion_r856573840


##########
cpp/src/arrow/engine/substrait/plan_internal.cc:
##########
@@ -108,13 +108,23 @@ void SetElement(size_t i, const Element& element, 
std::vector<T>* vector) {
   }
   (*vector)[i] = static_cast<T>(element);
 }
+
+template <typename Element, typename key, typename value>
+void SetMapElement(key i, const Element& element, std::unordered_map<key, 
value>* map) {
+  DCHECK_LE(i, 1 << 20);
+  if (i >= map->size()) {
+    map->reserve(i + 1);
+  }
+  (*map)[i] = static_cast<value>(element);
+}
+
 }  // namespace
 
 Result<ExtensionSet> GetExtensionSetFromPlan(const substrait::Plan& plan,
                                              ExtensionIdRegistry* registry) {
-  std::vector<util::string_view> uris;
+  std::unordered_map<uint32_t, util::string_view> uris;
   for (const auto& uri : plan.extension_uris()) {
-    SetElement(uri.extension_uri_anchor(), uri.uri(), &uris);
+    SetMapElement(uri.extension_uri_anchor(), uri.uri(), &uris);
   }
 
   // NOTE: it's acceptable to use views to memory owned by plan; 
ExtensionSet::Make

Review Comment:
   I'm inclined to agree, because personally I still don't understand why Arrow 
should have producer code at all. The best argument I've heard yet is to 
communicate plans across language boundaries within Arrow, but it'd be so much 
easier to just make protobuf messages that actually correspond to the execution 
engine data structures and type system for that. Heck, throw out protobuf in 
favor of something that links sanely. The second best is roundtrip testing, but 
is all the code needed to not only produce but also bit-by-bit *reproduce* the 
same plan really worth a factor 2 to 3 in code size?
   
   But anyway, what you're describing as a non-goal for those functions is 
exactly how they are currently used. In practice the anchors will all already 
exist, since the producer code is AFAIK incomplete and can only reproduce an 
incoming plan, so the new-anchor-generation code would never trigger under 
normal circumstances. IMO that's no excuse for that function to just silently 
override the same anchor over and over again in that case, though. It should at 
least detect it and throw an error.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to