pitrou commented on code in PR #46261: URL: https://github.com/apache/arrow/pull/46261#discussion_r2123685108
########## cpp/src/arrow/compute/kernels/api.h: ########## @@ -0,0 +1,37 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#pragma once + +#include "arrow/compute/visibility.h" +#include "arrow/status.h" + +namespace arrow::compute { + +namespace internal { + +/// \brief Register all compute kernels. +Status RegisterComputeKernels(); Review Comment: Is there a reason this is exposed in a `.h` file? It doesn't seem used outside of `Initialize`. ########## python/pyarrow/includes/libarrow.pxd: ########## @@ -2885,6 +2885,9 @@ cdef extern from "arrow/compute/api.h" namespace "arrow::compute" nogil: cdef c_string ToString(DatumType kind) +cdef extern from "arrow/compute/kernels/api.h" namespace "arrow::compute" nogil: + CStatus CInitializeCompute " arrow::compute::Initialize"() Review Comment: Not sure we need a "C" here :)) ```suggestion CStatus InitializeCompute " arrow::compute::Initialize"() ``` ########## r/src/compute.cpp: ########## @@ -618,6 +622,13 @@ SEXP compute__CallFunction(std::string func_name, cpp11::list args, cpp11::list // [[arrow::export]] std::vector<std::string> compute__GetFunctionNames() { +#if ARROW_VERSION_MAJOR >= 21 + auto compute_init_status_ = arrow::compute::Initialize(); Review Comment: Why do we do this in `GetFunctionNames` specifically? Should add a comment explaining why. ########## ruby/red-arrow/lib/arrow/loader.rb: ########## @@ -31,6 +31,7 @@ def post_load(repository, namespace) require_extension_library gc_guard self.class.start_callback_dispatch_thread + @base_module.compute_initialize Review Comment: @kou Does this look ok? ########## docs/source/cpp/compute.rst: ########## @@ -42,6 +43,16 @@ whether the inputs are integral or floating-point). Functions are stored in a global :class:`FunctionRegistry` where they can be looked up by name. +Compute Initialization +---------------------- + +The compute library requires a call to :func:`arrow::compute::Initialize` +in order to register the individual functions into the :class:`FunctionRegistry`, otherwise +only the functions required for Arrow core functionality will be available \(1). + +\(1) The set of functions required for Arrow core functionality are an implementation detail +of the library, and should not be considered stable. Review Comment: Nits ```suggestion The compute library requires a call to :func:`arrow::compute::Initialize` in order to register the individual functions into the global :class:`FunctionRegistry`, otherwise only the functions required for Arrow core functionality will be available. .. note:: The set of functions required for Arrow core functionality are an implementation detail of the library, and should not be considered stable. ``` ########## r/src/compute.cpp: ########## @@ -618,6 +622,13 @@ SEXP compute__CallFunction(std::string func_name, cpp11::list args, cpp11::list // [[arrow::export]] std::vector<std::string> compute__GetFunctionNames() { +#if ARROW_VERSION_MAJOR >= 21 + auto compute_init_status_ = arrow::compute::Initialize(); + if (!compute_init_status_.ok()) { + cpp11::stop("Failed to initialize compute functions: %s", + compute_init_status_.ToString().c_str()); + } Review Comment: Any reason not to reuse `StopIfNotOk`? ########## cpp/src/arrow/compute/kernels/api.h: ########## @@ -0,0 +1,37 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#pragma once + +#include "arrow/compute/visibility.h" +#include "arrow/status.h" + +namespace arrow::compute { + +namespace internal { + +/// \brief Register all compute kernels. +Status RegisterComputeKernels(); + +} // namespace internal + +/// \brief Initialize the compute module. +/// +/// Registers the compute kernel functions to be available on the FunctionRegistry. Review Comment: Nits ```suggestion /// \brief Initialize the compute module. /// /// Register the compute kernel functions to be available on the /// global FunctionRegistry. ``` ########## python/pyarrow/_compute.pyx: ########## @@ -70,6 +70,10 @@ def _forbid_instantiation(klass, subclasses_instead=True): raise TypeError(msg) +# Call to initialize the compute module (register kernels) on import +check_status(CInitializeCompute()) Review Comment: Nit: can you move this either towards the top or the bottom of the file, instead of hidden between various function declarations? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org