[I] [C++][Tensor] Incorrect logic for creating arrow::SparseCOOTensor from Tensor via arrow::row::SparseCOOTensor (illegal memory access and incorrect values) [arrow]

via GitHub Sat, 06 Sep 2025 21:05:27 -0700


andishgar opened a new issue, #47520:
URL: https://github.com/apache/arrow/issues/47520


   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   The following method causes illegal memory access when creating from a 
combination of negative zero and non-zero values, and produces incorrect tensor 
values (potentially leading to illegal memory access) when creating from a 
column-major tensor.
   
   
https://github.com/apache/arrow/blob/fddd35607ed09ac5c6f1b358c8feb4f207f390f3/cpp/src/arrow/sparse_tensor.h#L583-L594
   
   1- The following code leads to illegal memory access.
   ```c++
   TEST(MyTest, SegFault) {
     // clang-format off
     std::vector<float> data{
       -0.0, -0.0, -0.0,
       -0.0, -0.0, -0.0,
       -0.0, -0.0, -0.0,
       -1.0, -0.0, -0.0,
       };
   
     // clang-format on
   
     std::vector<int64_t> shape = {4, 3};
     auto buffer = Buffer::FromVector(data);
     ASSERT_OK_AND_ASSIGN(auto dense_tensor, Tensor::Make(float32(), buffer, 
shape));
     ASSERT_OK_AND_ASSIGN(auto sparse_coo_tensor,
                          SparseCOOTensor::Make(*dense_tensor, int64()));
     ARROW_LOGGER_INFO("", 
sparse_coo_tensor->sparse_index()->non_zero_length());
   }
   ``` 
   and the error is:
   
   ```
   : 1
   mimalloc: error: buffer overflow in heap block 0x0200000100C0 of size 152: 
write after 152 bytes
   Process finished with exit code 134 (interrupted by signal 6:SIGABRT)
   ``` 
   The reason for this is rooted in the following code.
   
   
https://github.com/apache/arrow/blob/fddd35607ed09ac5c6f1b358c8feb4f207f390f3/cpp/src/arrow/tensor/coo_converter.cc#L179
   
   In the above code, both positive and negative zero are considered zero; 
however, in the following code, negative zero is treated as a non-zero value.
   
   
https://github.com/apache/arrow/blob/fddd35607ed09ac5c6f1b358c8feb4f207f390f3/cpp/src/arrow/tensor/coo_converter.cc#L193
   
https://github.com/apache/arrow/blob/fddd35607ed09ac5c6f1b358c8feb4f207f390f3/cpp/src/arrow/tensor/coo_converter.cc#L65-L66
   Note that in the above code, the type of` c_value_type` is always an 
unsigned integer due to the following code.
   
https://github.com/apache/arrow/blob/fddd35607ed09ac5c6f1b358c8feb4f207f390f3/cpp/src/arrow/tensor/converter_internal.h#L22-L88
   
   2- Incorrect tensor values when creating a SparseCOOTensor from a 
ColumnMajorTensor via SparseCOOTensor::Make
   ```c++
   TEST(My, ColumnMajor) {
     // clang-format off
     std::vector<int> data{
       1, 4, 7, 10,
       2, 5, 8, 11,
       3, 6, 9, 12
                         };
     // clang-format on
     std::vector<int> data_2{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12};
     auto buffer = Buffer::FromVector(data);
     auto buffer_2 = Buffer::FromVector(data_2);
     std::vector<int64_t> shape = {4, 3};
     std::vector<int64_t> strides = {sizeof(int), 4 * sizeof(int)};
     ASSERT_OK_AND_ASSIGN(auto tensor, Tensor::Make(int32(), buffer, shape, 
strides));
     ASSERT_OK_AND_ASSIGN(auto tensor_2, Tensor::Make(int32(), buffer_2, 
shape));
     ASSERT_TRUE(tensor->Equals(*tensor_2));
     ASSERT_TRUE(tensor->is_contiguous());
     ASSERT_TRUE(tensor->is_column_major());
     ASSERT_OK_AND_ASSIGN(auto sparse_tensor, SparseCOOTensor::Make(*tensor))
     ASSERT_OK_AND_ASSIGN(auto new_tensor, sparse_tensor->ToTensor());
     ASSERT_EQ(12, sparse_tensor->non_zero_length());
     ASSERT_TRUE(new_tensor->is_contiguous());
     ASSERT_TRUE(new_tensor->is_row_major());
     // new_tensor is not equal to tensor!!
     ASSERT_FALSE(new_tensor->Equals(*tensor));
   }
   ``` 
   `ASSERT_FALSE(new_tensor->Equals(*tensor));` passes because of the following 
code.
   
https://github.com/apache/arrow/blob/fddd35607ed09ac5c6f1b358c8feb4f207f390f3/cpp/src/arrow/tensor/coo_converter.cc#L86-L91
   Given that logic, the index `{2,3}` is produced for a column-major 
contiguous tensor with shape `{4,3}`, which is not a correct index.
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [C++][Tensor] Incorrect logic for creating arrow::SparseCOOTensor from Tensor via arrow::row::SparseCOOTensor (illegal memory access and incorrect values) [arrow]

Reply via email to