jorisvandenbossche commented on a change in pull request #8510:
URL: https://github.com/apache/arrow/pull/8510#discussion_r791482403
##########
File path: cpp/src/arrow/extension_type_test.cc
##########
@@ -333,4 +334,144 @@ TEST_F(TestExtensionType, ValidateExtensionArray) {
ASSERT_OK(ext_arr4->ValidateFull());
}
+class TensorArray : public ExtensionArray {
+ public:
+ using ExtensionArray::ExtensionArray;
+};
+
+class TensorArrayType : public ExtensionType {
+ public:
+ explicit TensorArrayType(const std::shared_ptr<DataType>& type,
+ const std::vector<int64_t>& shape,
+ const std::vector<int64_t>& strides)
+ : ExtensionType(type), type_(type), shape_(shape), strides_(strides) {}
+
+ std::shared_ptr<DataType> type() const { return type_; }
+ std::vector<int64_t> shape() const { return shape_; }
+ std::vector<int64_t> strides() const { return strides_; }
+
+ std::string extension_name() const override {
+ std::stringstream s;
+ s << "ext-array-tensor-type<type=" << *storage_type() << ", shape=(";
+ for (uint64_t i = 0; i < shape_.size(); i++) {
+ s << shape_[i];
+ if (i < shape_.size() - 1) {
+ s << ", ";
+ }
+ }
+ s << "), strides=(";
+ for (uint64_t i = 0; i < strides_.size(); i++) {
+ s << strides_[i];
+ if (i < strides_.size() - 1) {
+ s << ", ";
+ }
+ }
+ s << ")>";
+ return s.str();
+ }
+
+ bool ExtensionEquals(const ExtensionType& other) const override {
+ return this->shape() == static_cast<const TensorArrayType&>(other).shape();
Review comment:
> I removed the `ndim` check and made equality comparison only check for
type name (`ext-array-tensor-type`).
I don't really understand the argument about not being strict in the type
equality.
Currently, the example code here uses `FixedSizeListArray` as the underlying
storage type, for _constant_ dimension (and I would say constant shape and
strides, as well?) for the full array.
So if you have multiple files in the same dataset where the shape of the
tensors differs between files, how can that ever be combined in a single schema
/ table? FixedSizeListArray with a different size are not compatible with each
other.
If you want to be flexible regarding the shape of each tensor value, it
seems that is rather what is covered in
https://issues.apache.org/jira/browse/ARROW-8714, where the storage type could
be a struct with one field the data as a variable sized list array and the
other field storing the shape of each tensor value.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]