niyue commented on code in PR #38116:
URL: https://github.com/apache/arrow/pull/38116#discussion_r1367578100
##########
cpp/src/gandiva/function_registry.cc:
##########
@@ -16,18 +16,23 @@
// under the License.
#include "gandiva/function_registry.h"
+
+#include <iterator>
+#include <utility>
+#include <vector>
+
+#include "arrow/util/logging.h"
#include "gandiva/function_registry_arithmetic.h"
#include "gandiva/function_registry_datetime.h"
#include "gandiva/function_registry_hash.h"
#include "gandiva/function_registry_math_ops.h"
#include "gandiva/function_registry_string.h"
#include "gandiva/function_registry_timestamp_arithmetic.h"
-#include <iterator>
-#include <utility>
-#include <vector>
-
namespace gandiva {
+constexpr uint32_t kMaxFunctionSignatures = 2048;
+
+FunctionRegistry::FunctionRegistry() {
pc_registry_.reserve(kMaxFunctionSignatures); }
FunctionRegistry::iterator FunctionRegistry::begin() const {
Review Comment:
There is one caveat here, the `pc_registry_map_` stores the signatures with
their pointers and key/values:
```
for (auto& elem : pc_registry_) {
for (auto& func_signature : elem.signatures()) {
map.emplace(&func_signature, &elem);
}
}
```
It is not a problem previously if the size of function registry is known up
front, however, here we allows users to add new functions, which makes the
`pc_registry_` std::vector to be resized, and the pointer stored in
`pc_registry_map_` will be invalidated after `pc_registry_` is resized.
Initially, I would like to switch `pc_registry_` from `std::vector` to
`std::list` to guarantee pointer stability, however, I find currently
`FunctionRegistry` provides an `iterator` API `begin/end/back` which returns
`iterator = const NativeFunction *`, this assumes the backend container is
`std::vector`, which stores these `NativeFunction` in a continuous manner.
This iterator API is used by `ExpressionRegistry`, which is likely used by
applications (e.g. the `GetRegisteredFunctionSignatures` is never used
internally in Gandiva project).
To avoid breaking existing API, currently, I have to pre-allocate the
`pc_registry_` vector so that it won't be resized, but this limits the registry
to have capacity 2048 (which is likely enough in most cases). Please advise if
there is better approach for addressing this issue. Thanks.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]