This is an automated email from the ASF dual-hosted git repository.
kevingurney pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git
The following commit(s) were added to refs/heads/main by this push:
new 82045527b7 GH-41654: [MATLAB] Add new `arrow.c.Schema` MATLAB class
which wraps a C Data Interface format `ArrowSchema` C struct (#41674)
82045527b7 is described below
commit 82045527b775d3847d3f34ebb51af852c76a2e44
Author: Kevin Gurney <[email protected]>
AuthorDate: Wed May 15 10:49:29 2024 -0400
GH-41654: [MATLAB] Add new `arrow.c.Schema` MATLAB class which wraps a C
Data Interface format `ArrowSchema` C struct (#41674)
### Rationale for this change
Now that the MATLAB interface has support for `arrow.tabular.RecordBatch`
and `arrow.array.Array`, we should add support for the [C Data
Interface](https://arrow.apache.org/docs/format/CDataInterface.html) format.
The C Data Interface is based around two C struct definitions: (1)
`ArrowArray` and (2) `ArrowSchema`.
Now that #41653 (add support for `arrow.c.Array`) has been addressed, we
should add another new MATLAB class (e.g. `arrow.c.Schema`) which wraps the
underlying `ArrowSchema` C struct.
Once we have added these two MATLAB classes, we can then add import and
export functionality to share the Arrow memory between multiple language
runtimes running in the same process.
This would help enable workflows like sharing Arrow data between the MATLAB
Interface to Arrow and `pyarrow` running within the MATLAB process via the
[MATLAB interface to
Python](https://www.mathworks.com/help/matlab/call-python-libraries.html)).
### What changes are included in this PR?
1. Added a new C++ proxy class called `arrow::matlab::c::proxy::Schema`
which wraps an `ArrowSchema` struct pointer. This class is registered as the
proxy `arrow.c.proxy.Schema` in order to make it accessible to MATLAB.
2. Added a new MATLAB class called `arrow.c.Schema` that has an
`arrow.c.proxy.Schema` instance. It has one public property named `Address`,
which is a scalar `uint64`. This property is the memory address of the
`ArrowSchema` struct pointer owned by `arrow.c.proxy.Schema`.
### Are these changes tested?
Yes.
1. Added a new test class called `test/arrow/c/tSchema.m`.
2. @ sgilmore10 and I created a prototype for importing and exporting arrow
`Array`s via the C Data Interface format
[here](https://github.com/mathworks/arrow/tree/arrow-array-address). We were
able to share arrow `Array`s and `RecordBatch`s between `mlarrow` and
`pyarrow`. Our plan now is to submit the necessary MATLAB code incrementally.
### Are there any user-facing changes?
Yes.
1. The `arrow.c.Schema` class is user-facing. However, it's only intended
for "advanced" use-cases. In the future, we may add higher-level functionality
on top of the C Data Interface so that users don't need to interact with it
directly.
2. **NOTE**: On destruction, `arrow.c.proxy.Schema` will check to see if
the `ArrowSchema` has already been consumed by an importer. If not,
`arrow.c.proxy.Schema`'s destructor will call the release callback on the
`ArrowSchema` to avoid memory leaks. To the best of our knowledge, this is
similar to the how the [Arrow PyCapsule
Interface](https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html)
works.
### Future Directions
1. #41656
2. We should probably follow up with a PR to create shared infrastructure
for `arrow.c.Array` and `arrow.c.Schema`, since they are almost identical in
design and implementation.
### Notes
1. Thank you @ sgilmore10 for your help with this pull request!
* GitHub Issue: #41654
Authored-by: Kevin Gurney <[email protected]>
Signed-off-by: Kevin Gurney <[email protected]>
---
matlab/src/cpp/arrow/matlab/c/proxy/array.h | 4 +-
.../arrow/matlab/c/proxy/{array.h => schema.cc} | 34 +++++++++------
.../cpp/arrow/matlab/c/proxy/{array.h => schema.h} | 12 +++---
matlab/src/cpp/arrow/matlab/proxy/factory.cc | 2 +
matlab/src/matlab/+arrow/+c/Schema.m | 37 +++++++++++++++++
matlab/test/arrow/c/tSchema.m | 48 ++++++++++++++++++++++
matlab/tools/cmake/BuildMatlabArrowInterface.cmake | 3 +-
7 files changed, 116 insertions(+), 24 deletions(-)
diff --git a/matlab/src/cpp/arrow/matlab/c/proxy/array.h
b/matlab/src/cpp/arrow/matlab/c/proxy/array.h
index b42b2dcd9c..bb35807fcd 100644
--- a/matlab/src/cpp/arrow/matlab/c/proxy/array.h
+++ b/matlab/src/cpp/arrow/matlab/c/proxy/array.h
@@ -34,8 +34,6 @@ class Array : public libmexclass::proxy::Proxy {
void getAddress(libmexclass::proxy::method::Context& context);
struct ArrowArray arrowArray;
-
- // struct ArrowArray* arrowArray;
};
-} // namespace arrow::matlab::c::proxy
\ No newline at end of file
+} // namespace arrow::matlab::c::proxy
diff --git a/matlab/src/cpp/arrow/matlab/c/proxy/array.h
b/matlab/src/cpp/arrow/matlab/c/proxy/schema.cc
similarity index 55%
copy from matlab/src/cpp/arrow/matlab/c/proxy/array.h
copy to matlab/src/cpp/arrow/matlab/c/proxy/schema.cc
index b42b2dcd9c..7f239f5628 100644
--- a/matlab/src/cpp/arrow/matlab/c/proxy/array.h
+++ b/matlab/src/cpp/arrow/matlab/c/proxy/schema.cc
@@ -15,27 +15,35 @@
// specific language governing permissions and limitations
// under the License.
+#include <cstddef>
#include "arrow/c/abi.h"
+#include "arrow/matlab/c/proxy/schema.h"
+
#include "libmexclass/proxy/Proxy.h"
namespace arrow::matlab::c::proxy {
-class Array : public libmexclass::proxy::Proxy {
- public:
- Array();
-
- ~Array();
+Schema::Schema() : arrowSchema{} { REGISTER_METHOD(Schema, getAddress); }
- static libmexclass::proxy::MakeResult make(
- const libmexclass::proxy::FunctionArguments& constructor_arguments);
+Schema::~Schema() {
+ if (arrowSchema.release != NULL) {
+ arrowSchema.release(&arrowSchema);
+ arrowSchema.release = NULL;
+ }
+}
- protected:
- void getAddress(libmexclass::proxy::method::Context& context);
+libmexclass::proxy::MakeResult Schema::make(
+ const libmexclass::proxy::FunctionArguments& constructor_arguments) {
+ return std::make_shared<Schema>();
+}
- struct ArrowArray arrowArray;
+void Schema::getAddress(libmexclass::proxy::method::Context& context) {
+ namespace mda = ::matlab::data;
- // struct ArrowArray* arrowArray;
-};
+ mda::ArrayFactory factory;
+ auto address = reinterpret_cast<uint64_t>(&arrowSchema);
+ context.outputs[0] = factory.createScalar(address);
+}
-} // namespace arrow::matlab::c::proxy
\ No newline at end of file
+} // namespace arrow::matlab::c::proxy
diff --git a/matlab/src/cpp/arrow/matlab/c/proxy/array.h
b/matlab/src/cpp/arrow/matlab/c/proxy/schema.h
similarity index 86%
copy from matlab/src/cpp/arrow/matlab/c/proxy/array.h
copy to matlab/src/cpp/arrow/matlab/c/proxy/schema.h
index b42b2dcd9c..8f781ea9c7 100644
--- a/matlab/src/cpp/arrow/matlab/c/proxy/array.h
+++ b/matlab/src/cpp/arrow/matlab/c/proxy/schema.h
@@ -21,11 +21,11 @@
namespace arrow::matlab::c::proxy {
-class Array : public libmexclass::proxy::Proxy {
+class Schema : public libmexclass::proxy::Proxy {
public:
- Array();
+ Schema();
- ~Array();
+ ~Schema();
static libmexclass::proxy::MakeResult make(
const libmexclass::proxy::FunctionArguments& constructor_arguments);
@@ -33,9 +33,7 @@ class Array : public libmexclass::proxy::Proxy {
protected:
void getAddress(libmexclass::proxy::method::Context& context);
- struct ArrowArray arrowArray;
-
- // struct ArrowArray* arrowArray;
+ struct ArrowSchema arrowSchema;
};
-} // namespace arrow::matlab::c::proxy
\ No newline at end of file
+} // namespace arrow::matlab::c::proxy
diff --git a/matlab/src/cpp/arrow/matlab/proxy/factory.cc
b/matlab/src/cpp/arrow/matlab/proxy/factory.cc
index cf13ed6aa5..d7a8fa9ac2 100644
--- a/matlab/src/cpp/arrow/matlab/proxy/factory.cc
+++ b/matlab/src/cpp/arrow/matlab/proxy/factory.cc
@@ -26,6 +26,7 @@
#include "arrow/matlab/array/proxy/timestamp_array.h"
#include "arrow/matlab/buffer/proxy/buffer.h"
#include "arrow/matlab/c/proxy/array.h"
+#include "arrow/matlab/c/proxy/schema.h"
#include "arrow/matlab/error/error.h"
#include "arrow/matlab/io/csv/proxy/table_reader.h"
#include "arrow/matlab/io/csv/proxy/table_writer.h"
@@ -101,6 +102,7 @@ libmexclass::proxy::MakeResult Factory::make_proxy(
REGISTER_PROXY(arrow.io.csv.proxy.TableWriter ,
arrow::matlab::io::csv::proxy::TableWriter);
REGISTER_PROXY(arrow.io.csv.proxy.TableReader ,
arrow::matlab::io::csv::proxy::TableReader);
REGISTER_PROXY(arrow.c.proxy.Array ,
arrow::matlab::c::proxy::Array);
+ REGISTER_PROXY(arrow.c.proxy.Schema ,
arrow::matlab::c::proxy::Schema);
// clang-format on
return libmexclass::error::Error{error::UNKNOWN_PROXY_ERROR_ID,
diff --git a/matlab/src/matlab/+arrow/+c/Schema.m
b/matlab/src/matlab/+arrow/+c/Schema.m
new file mode 100644
index 0000000000..29eba59016
--- /dev/null
+++ b/matlab/src/matlab/+arrow/+c/Schema.m
@@ -0,0 +1,37 @@
+%SCHEMA Wrapper for an Arrow C Data Interface format ArrowSchema C struct
pointer.
+
+% Licensed to the Apache Software Foundation (ASF) under one or more
+% contributor license agreements. See the NOTICE file distributed with
+% this work for additional information regarding copyright ownership.
+% The ASF licenses this file to you under the Apache License, Version
+% 2.0 (the "License"); you may not use this file except in compliance
+% with the License. You may obtain a copy of the License at
+%
+% http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS,
+% WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+% implied. See the License for the specific language governing
+% permissions and limitations under the License.
+classdef Schema < matlab.mixin.Scalar
+
+ properties (Hidden, SetAccess=private, GetAccess=public)
+ Proxy
+ end
+
+ properties(Dependent, GetAccess=public, SetAccess=private)
+ Address(1, 1) uint64
+ end
+
+ methods
+ function obj = Schema()
+ proxyName = "arrow.c.proxy.Schema";
+ obj.Proxy = arrow.internal.proxy.create(proxyName);
+ end
+
+ function address = get.Address(obj)
+ address = obj.Proxy.getAddress();
+ end
+ end
+end
\ No newline at end of file
diff --git a/matlab/test/arrow/c/tSchema.m b/matlab/test/arrow/c/tSchema.m
new file mode 100644
index 0000000000..16dcf1965b
--- /dev/null
+++ b/matlab/test/arrow/c/tSchema.m
@@ -0,0 +1,48 @@
+%TSCHEMA Defines unit tests for arrow.c.Schema.
+
+% Licensed to the Apache Software Foundation (ASF) under one or more
+% contributor license agreements. See the NOTICE file distributed with
+% this work for additional information regarding copyright ownership.
+% The ASF licenses this file to you under the Apache License, Version
+% 2.0 (the "License"); you may not use this file except in compliance
+% with the License. You may obtain a copy of the License at
+%
+% http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS,
+% WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+% implied. See the License for the specific language governing
+% permissions and limitations under the License.
+classdef tSchema < matlab.unittest.TestCase
+
+ methods (Test)
+ function TestClassStructure(testCase)
+ schema = arrow.c.Schema();
+
+ % Verify schema is an instance of arrow.c.Schema.
+ testCase.verifyInstanceOf(schema, "arrow.c.Schema");
+
+ % Verify schema has one public property named Address.
+ props = properties(schema);
+ testCase.verifyEqual(props, {'Address'});
+ end
+
+ function TestAddressProperty(testCase)
+ schema = arrow.c.Schema();
+
+ % It's impossible to know what the value of Address will be.
+ % Just verify Address is a scalar uint64.
+ address = schema.Address;
+ testCase.verifyInstanceOf(address, "uint64");
+ testCase.verifyTrue(isscalar(address));
+ end
+
+ function TestAddressNoSetter(testCase)
+ % Verify the Address property is read-only.
+ schema = arrow.c.Schema();
+ fcn = @() setfield(schema, "Address", uint64(10));
+ testCase.verifyError(fcn, "MATLAB:class:SetProhibited");
+ end
+ end
+end
\ No newline at end of file
diff --git a/matlab/tools/cmake/BuildMatlabArrowInterface.cmake
b/matlab/tools/cmake/BuildMatlabArrowInterface.cmake
index 7a8cf8f403..8f37bef77b 100644
--- a/matlab/tools/cmake/BuildMatlabArrowInterface.cmake
+++ b/matlab/tools/cmake/BuildMatlabArrowInterface.cmake
@@ -76,7 +76,8 @@ set(MATLAB_ARROW_LIBMEXCLASS_CLIENT_PROXY_SOURCES
"${CMAKE_SOURCE_DIR}/src/cpp/a
"${CMAKE_SOURCE_DIR}/src/cpp/arrow/matlab/io/csv/proxy/table_reader.cc"
"${CMAKE_SOURCE_DIR}/src/cpp/arrow/matlab/index/validate.cc"
"${CMAKE_SOURCE_DIR}/src/cpp/arrow/matlab/buffer/proxy/buffer.cc"
-
"${CMAKE_SOURCE_DIR}/src/cpp/arrow/matlab/c/proxy/array.cc")
+
"${CMAKE_SOURCE_DIR}/src/cpp/arrow/matlab/c/proxy/array.cc"
+
"${CMAKE_SOURCE_DIR}/src/cpp/arrow/matlab/c/proxy/schema.cc")
set(MATLAB_ARROW_LIBMEXCLASS_CLIENT_PROXY_FACTORY_INCLUDE_DIR
"${CMAKE_SOURCE_DIR}/src/cpp/arrow/matlab/proxy")