rok commented on a change in pull request #8510:
URL: https://github.com/apache/arrow/pull/8510#discussion_r722161976
##########
File path: cpp/src/arrow/extension_type_test.cc
##########
@@ -333,4 +334,144 @@ TEST_F(TestExtensionType, ValidateExtensionArray) {
ASSERT_OK(ext_arr4->ValidateFull());
}
+class TensorArray : public ExtensionArray {
+ public:
+ using ExtensionArray::ExtensionArray;
+};
+
+class TensorArrayType : public ExtensionType {
+ public:
+ explicit TensorArrayType(const std::shared_ptr<DataType>& type,
+ const std::vector<int64_t>& shape,
+ const std::vector<int64_t>& strides)
+ : ExtensionType(type), type_(type), shape_(shape), strides_(strides) {}
+
+ std::shared_ptr<DataType> type() const { return type_; }
+ std::vector<int64_t> shape() const { return shape_; }
+ std::vector<int64_t> strides() const { return strides_; }
+
+ std::string extension_name() const override {
+ std::stringstream s;
+ s << "ext-array-tensor-type<type=" << *storage_type() << ", shape=(";
+ for (uint64_t i = 0; i < shape_.size(); i++) {
+ s << shape_[i];
+ if (i < shape_.size() - 1) {
+ s << ", ";
+ }
+ }
+ s << "), strides=(";
+ for (uint64_t i = 0; i < strides_.size(); i++) {
+ s << strides_[i];
+ if (i < strides_.size() - 1) {
+ s << ", ";
+ }
+ }
+ s << ")>";
+ return s.str();
+ }
+
+ bool ExtensionEquals(const ExtensionType& other) const override {
+ return this->shape() == static_cast<const TensorArrayType&>(other).shape();
Review comment:
> > Ah, I misunderstood your suggestion. Why would you need it so loose?
>
> Practically speaking, this means that one cannot have for e.g. a `(10, 5,
4)` shape Tensor and a `(10, 6, 2)` Tensor in separate parquet files in the
same dataset -- IIRC the Dataset API will complain that their metadata doesn't
agree (due to the strict equality comparison). I do run into these sort of
cases in the datasets that I deal with.
>
> More formally, I would argue that parameterising Tensors Types on `shape`
and `stride` introduces an infinite number of parameterisations and I'm not
sure that this class of parameterisations is useful. This doesn't imply that
`shape` and `stride` should not be attributes on a Tensor Type!
Got it. Infinite types sounds a bit redundant indeed.
The only reason I can think of to have dimensions and strides in equality
comparison then is if we did some compute kernels that needed to identify type
in advance. But even that can probably be solved at runtime.
In that case do we even want to keep `ndim` for equality comparison?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]