pitrou commented on code in PR #38116:
URL: https://github.com/apache/arrow/pull/38116#discussion_r1363468262
##########
cpp/src/gandiva/engine.cc:
##########
@@ -248,11 +250,11 @@ Status Engine::LoadPreCompiledIR() {
Status::CodeGenError("Could not load module from IR: ",
buffer_or_error.getError().message()));
- std::unique_ptr<llvm::MemoryBuffer> buffer =
std::move(buffer_or_error.get());
+ auto buffer = std::move(buffer_or_error.get());
/// Parse the IR module.
- llvm::Expected<std::unique_ptr<llvm::Module>> module_or_error =
- llvm::getOwningLazyBitcodeModule(std::move(buffer), *context());
+ auto module_or_error = llvm::getOwningLazyBitcodeModule(std::move(buffer),
*context());
Review Comment:
Can we keep the explicit type declarations here and above? They are helpful
to people who don't know the LLVM API.
##########
cpp/src/gandiva/llvm_external_ir_store.h:
##########
@@ -0,0 +1,40 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include <arrow/status.h>
+#include <gandiva/visibility.h>
+#include <llvm/Support/MemoryBuffer.h>
+#include <memory>
+#include <vector>
+
+namespace gandiva {
+using arrow::Status;
+
+class GANDIVA_EXPORT LLVMExternalIRStore {
+ public:
+ /// \brief add an LLVM IR to the store from a given bitcode file path
+ static Status Add(const std::string& bitcode_file_path);
Review Comment:
Why are all these methods static? It would be much more flexible if this was
a proper class that one can instantiate. Is there a technical constraint that
mandates this?
##########
cpp/src/gandiva/llvm_generator_test.cc:
##########
@@ -17,6 +17,7 @@
#include "gandiva/llvm_generator.h"
+#include <filesystem>
Review Comment:
Is this used?
##########
cpp/src/gandiva/llvm_external_ir_store.h:
##########
@@ -0,0 +1,40 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include <arrow/status.h>
+#include <gandiva/visibility.h>
+#include <llvm/Support/MemoryBuffer.h>
+#include <memory>
+#include <vector>
+
+namespace gandiva {
+using arrow::Status;
+
+class GANDIVA_EXPORT LLVMExternalIRStore {
Review Comment:
Nit, but is "IR" the customary term here? LLVM has textual IR and binary
bitcode, is "IR" usable for both?
##########
cpp/src/gandiva/llvm_external_ir_store.cc:
##########
@@ -0,0 +1,52 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "gandiva/llvm_external_ir_store.h"
+
+#include <llvm/Bitcode/BitcodeReader.h>
+
+namespace gandiva {
+
+static std::vector<std::unique_ptr<llvm::MemoryBuffer>>&
get_stored_memory_buffers() {
+ static std::vector<std::unique_ptr<llvm::MemoryBuffer>> memory_buffers;
+ return memory_buffers;
+}
+
+Status LLVMExternalIRStore::Add(const std::string& bitcode_file_path) {
+ auto buffer_or_error = llvm::MemoryBuffer::getFile(bitcode_file_path);
+
+ ARROW_RETURN_IF(!buffer_or_error,
+ Status::CodeGenError("Could not load module from IR file: ",
Review Comment:
Ah, this isn't even parsing the file, just loading it, right? So it
shouldn't be a codegen error at all...
##########
cpp/src/gandiva/engine.h:
##########
@@ -93,6 +93,9 @@ class GANDIVA_EXPORT Engine {
/// the main module.
Status LoadPreCompiledIR();
+ // load external pre-compiled IR modules from LLVMIRStore
+ Status LoadExternalPreCompiledIR();
Review Comment:
This would be more flexible if it took a `LLVMIRStore` instance as argument,
no?
##########
cpp/src/gandiva/llvm_generator_test.cc:
##########
@@ -115,4 +117,14 @@ TEST_F(TestLLVMGenerator, TestAdd) {
EXPECT_EQ(out_bitmap, 0ULL);
}
+TEST_F(TestLLVMGenerator, VerifyExtendedPCFunctions) {
+ ARROW_EXPECT_OK(LoadTestLLVMIR());
Review Comment:
Will this leak in other tests?
##########
cpp/src/gandiva/engine.cc:
##########
@@ -274,6 +277,27 @@ Status Engine::LoadPreCompiledIR() {
return Status::OK();
}
+Status Engine::LoadExternalPreCompiledIR() {
+ auto const& buffers = LLVMExternalIRStore::GetIRBuffers();
+ for (auto const& buffer : buffers) {
+ auto module_or_error = llvm::parseBitcodeFile(buffer->getMemBufferRef(),
*context());
+ if (!module_or_error) {
+ std::string str;
+ llvm::raw_string_ostream stream(str);
+ stream << module_or_error.takeError();
+ return Status::CodeGenError("Failed to parse bitcode file, error: " +
stream.str());
+ }
+ auto ir_module = std::move(module_or_error.get());
+
+ ARROW_RETURN_IF(llvm::verifyModule(*ir_module, &llvm::errs()),
+ Status::CodeGenError("verify of IR Module failed"));
+ ARROW_RETURN_IF(llvm::Linker::linkModules(*module_, std::move(ir_module)),
+ Status::CodeGenError("failed to link IR Modules"));
Review Comment:
Is there a way to get a richer error description from LLVM here? Saying that
it failed without saying why is not useful to the caller...
##########
cpp/src/gandiva/llvm_external_ir_store.h:
##########
@@ -0,0 +1,40 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include <arrow/status.h>
+#include <gandiva/visibility.h>
+#include <llvm/Support/MemoryBuffer.h>
+#include <memory>
+#include <vector>
Review Comment:
Can we split headers by category? stdlib, third-party and then Arrow/Gandiva.
For example:
```suggestion
#include <memory>
#include <vector>
#include <llvm/Support/MemoryBuffer.h>
#include "arrow/status.h"
#include "gandiva/visibility.h"
```
##########
cpp/src/gandiva/llvm_external_ir_store.cc:
##########
@@ -0,0 +1,52 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "gandiva/llvm_external_ir_store.h"
+
+#include <llvm/Bitcode/BitcodeReader.h>
+
+namespace gandiva {
+
+static std::vector<std::unique_ptr<llvm::MemoryBuffer>>&
get_stored_memory_buffers() {
Review Comment:
Mostly, we try to avoid non-const references in function signatures. This
should probably return a pointer to the vector.
##########
cpp/src/gandiva/tests/test_util.h:
##########
@@ -101,4 +103,16 @@ static inline std::shared_ptr<Configuration>
TestConfiguration() {
return builder.DefaultConfiguration();
}
+#ifndef GANDIVA_EXTENSION_TEST_DIR
+#define GANDIVA_EXTENSION_TEST_DIR "."
+#endif
+
+static inline Status LoadTestLLVMIR() {
Review Comment:
Instead of inlining everything here, it would be better to have a
`test_util.cc` (if that doesn't already exist)...
##########
cpp/src/gandiva/engine.cc:
##########
@@ -274,6 +277,27 @@ Status Engine::LoadPreCompiledIR() {
return Status::OK();
}
+Status Engine::LoadExternalPreCompiledIR() {
+ auto const& buffers = LLVMExternalIRStore::GetIRBuffers();
+ for (auto const& buffer : buffers) {
+ auto module_or_error = llvm::parseBitcodeFile(buffer->getMemBufferRef(),
*context());
+ if (!module_or_error) {
+ std::string str;
+ llvm::raw_string_ostream stream(str);
+ stream << module_or_error.takeError();
+ return Status::CodeGenError("Failed to parse bitcode file, error: " +
stream.str());
Review Comment:
Neat, but why isn't there a helper function that automates this error
conversion from LLVM to Status?
##########
cpp/src/gandiva/llvm_external_ir_store.cc:
##########
@@ -0,0 +1,52 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "gandiva/llvm_external_ir_store.h"
+
+#include <llvm/Bitcode/BitcodeReader.h>
+
+namespace gandiva {
+
+static std::vector<std::unique_ptr<llvm::MemoryBuffer>>&
get_stored_memory_buffers() {
+ static std::vector<std::unique_ptr<llvm::MemoryBuffer>> memory_buffers;
+ return memory_buffers;
+}
+
+Status LLVMExternalIRStore::Add(const std::string& bitcode_file_path) {
+ auto buffer_or_error = llvm::MemoryBuffer::getFile(bitcode_file_path);
+
+ ARROW_RETURN_IF(!buffer_or_error,
+ Status::CodeGenError("Could not load module from IR file: ",
Review Comment:
Is it a codegen error or an IO error? Is there a way to get richer
conversion from LLVM error to Arrow status?
(a malformed bitcode file shouldn't be the same error as a non-existent
file, for example)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]