andishgar opened a new issue, #47520: URL: https://github.com/apache/arrow/issues/47520
### Describe the bug, including details regarding any error messages, version, and platform. The following method causes illegal memory access when creating from a combination of negative zero and non-zero values, and produces incorrect tensor values (potentially leading to illegal memory access) when creating from a column-major tensor. https://github.com/apache/arrow/blob/fddd35607ed09ac5c6f1b358c8feb4f207f390f3/cpp/src/arrow/sparse_tensor.h#L583-L594 1- The following code leads to illegal memory access. ```c++ TEST(MyTest, SegFault) { // clang-format off std::vector<float> data{ -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -1.0, -0.0, -0.0, }; // clang-format on std::vector<int64_t> shape = {4, 3}; auto buffer = Buffer::FromVector(data); ASSERT_OK_AND_ASSIGN(auto dense_tensor, Tensor::Make(float32(), buffer, shape)); ASSERT_OK_AND_ASSIGN(auto sparse_coo_tensor, SparseCOOTensor::Make(*dense_tensor, int64())); ARROW_LOGGER_INFO("", sparse_coo_tensor->sparse_index()->non_zero_length()); } ``` and the error is: ``` : 1 mimalloc: error: buffer overflow in heap block 0x0200000100C0 of size 152: write after 152 bytes Process finished with exit code 134 (interrupted by signal 6:SIGABRT) ``` The reason for this is rooted in the following code. https://github.com/apache/arrow/blob/fddd35607ed09ac5c6f1b358c8feb4f207f390f3/cpp/src/arrow/tensor/coo_converter.cc#L179 In the above code, both positive and negative zero are considered zero; however, in the following code, negative zero is treated as a non-zero value. https://github.com/apache/arrow/blob/fddd35607ed09ac5c6f1b358c8feb4f207f390f3/cpp/src/arrow/tensor/coo_converter.cc#L193 https://github.com/apache/arrow/blob/fddd35607ed09ac5c6f1b358c8feb4f207f390f3/cpp/src/arrow/tensor/coo_converter.cc#L65-L66 Note that in the above code, the type of` c_value_type` is always an unsigned integer due to the following code. https://github.com/apache/arrow/blob/fddd35607ed09ac5c6f1b358c8feb4f207f390f3/cpp/src/arrow/tensor/converter_internal.h#L22-L88 2- Incorrect tensor values when creating a SparseCOOTensor from a ColumnMajorTensor via SparseCOOTensor::Make ```c++ TEST(My, ColumnMajor) { // clang-format off std::vector<int> data{ 1, 4, 7, 10, 2, 5, 8, 11, 3, 6, 9, 12 }; // clang-format on std::vector<int> data_2{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}; auto buffer = Buffer::FromVector(data); auto buffer_2 = Buffer::FromVector(data_2); std::vector<int64_t> shape = {4, 3}; std::vector<int64_t> strides = {sizeof(int), 4 * sizeof(int)}; ASSERT_OK_AND_ASSIGN(auto tensor, Tensor::Make(int32(), buffer, shape, strides)); ASSERT_OK_AND_ASSIGN(auto tensor_2, Tensor::Make(int32(), buffer_2, shape)); ASSERT_TRUE(tensor->Equals(*tensor_2)); ASSERT_TRUE(tensor->is_contiguous()); ASSERT_TRUE(tensor->is_column_major()); ASSERT_OK_AND_ASSIGN(auto sparse_tensor, SparseCOOTensor::Make(*tensor)) ASSERT_OK_AND_ASSIGN(auto new_tensor, sparse_tensor->ToTensor()); ASSERT_EQ(12, sparse_tensor->non_zero_length()); ASSERT_TRUE(new_tensor->is_contiguous()); ASSERT_TRUE(new_tensor->is_row_major()); // new_tensor is not equal to tensor!! ASSERT_FALSE(new_tensor->Equals(*tensor)); } ``` `ASSERT_FALSE(new_tensor->Equals(*tensor));` passes because of the following code. https://github.com/apache/arrow/blob/fddd35607ed09ac5c6f1b358c8feb4f207f390f3/cpp/src/arrow/tensor/coo_converter.cc#L86-L91 Given that logic, the index `{2,3}` is produced for a column-major contiguous tensor with shape `{4,3}`, which is not a correct index. ### Component(s) C++ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org