sjperkins commented on a change in pull request #8510:
URL: https://github.com/apache/arrow/pull/8510#discussion_r722148585



##########
File path: cpp/src/arrow/extension_type_test.cc
##########
@@ -333,4 +334,144 @@ TEST_F(TestExtensionType, ValidateExtensionArray) {
   ASSERT_OK(ext_arr4->ValidateFull());
 }
 
+class TensorArray : public ExtensionArray {
+ public:
+  using ExtensionArray::ExtensionArray;
+};
+
+class TensorArrayType : public ExtensionType {
+ public:
+  explicit TensorArrayType(const std::shared_ptr<DataType>& type,
+                           const std::vector<int64_t>& shape,
+                           const std::vector<int64_t>& strides)
+      : ExtensionType(type), type_(type), shape_(shape), strides_(strides) {}
+
+  std::shared_ptr<DataType> type() const { return type_; }
+  std::vector<int64_t> shape() const { return shape_; }
+  std::vector<int64_t> strides() const { return strides_; }
+
+  std::string extension_name() const override {
+    std::stringstream s;
+    s << "ext-array-tensor-type<type=" << *storage_type() << ", shape=(";
+    for (uint64_t i = 0; i < shape_.size(); i++) {
+      s << shape_[i];
+      if (i < shape_.size() - 1) {
+        s << ", ";
+      }
+    }
+    s << "), strides=(";
+    for (uint64_t i = 0; i < strides_.size(); i++) {
+      s << strides_[i];
+      if (i < strides_.size() - 1) {
+        s << ", ";
+      }
+    }
+    s << ")>";
+    return s.str();
+  }
+
+  bool ExtensionEquals(const ExtensionType& other) const override {
+    return this->shape() == static_cast<const TensorArrayType&>(other).shape();

Review comment:
       > Ah, I misunderstood your suggestion. Why would you need it so loose?
   
   More practically, this means that one cannot have for e.g. a `(10, 5, 4)` 
shape Tensor and a `(10, 6, 2)` Tensor in separate parquet files in the same 
dataset -- IIRC the Dataset API will complain that their metadata doesn't agree 
(due to the strict equality comparison). I do run into these sort of cases in 
the datasets that I deal with.
   
   I would argue that parameterising Tensors Types on `shape` and `stride` 
introduces an infinite number of parameterisations and I'm not sure that this 
class of parameterisations is useful. This doesn't imply that `shape` and 
`stride` should not be attributes on a Tensor Type!
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to