Copilot commented on code in PR #41870:
URL: https://github.com/apache/arrow/pull/41870#discussion_r3503928701


##########
python/pyarrow/tests/test_table.py:
##########
@@ -1269,6 +1252,70 @@ def test_recordbatch_to_tensor_unsupported():
         batch.to_tensor()
 
 
[email protected]
[email protected]('typ_str', [
+    "uint8", "uint16", "uint32", "uint64",
+    "int8", "int16", "int32", "int64",
+    "float16", "float32", "float64",
+])
+def test_table_to_tensor_uniform_type(typ_str):
+    arr1 = [[1, 2, 3], [4, 5, 6, 7, 8, 9]]
+    arr2 = [[10, 20], [30, 40, 50, 60, 70, 80, 90]]
+    arr3 = [[100, 100, 100, 100, 100, 100], [100, 100, 100]]
+    table = pa.Table.from_arrays(
+        [
+            pa.chunked_array(arr1, type=pa.from_numpy_dtype(typ_str)),
+            pa.chunked_array(arr2, type=pa.from_numpy_dtype(typ_str)),
+            pa.chunked_array(arr3, type=pa.from_numpy_dtype(typ_str)),
+        ], ["a", "b", "c"]
+    )

Review Comment:
   `pa.from_numpy_dtype(...)` typically expects a NumPy dtype (e.g. 
`np.dtype('uint8')`) rather than a string; passing `typ_str` directly can be 
incompatible depending on the implementation. To make the test robust and 
consistent with `test_recordbatch_to_tensor_uniform_type`, convert first via 
`np.dtype(typ_str)` (or use an explicit `pa.uint8()` / etc. mapping).



##########
cpp/src/arrow/tensor.cc:
##########
@@ -28,11 +28,12 @@
 #include <type_traits>
 #include <vector>
 
-#include "arrow/record_batch.h"
 #include "arrow/status.h"
+#include "arrow/table.h"
 #include "arrow/type.h"
 #include "arrow/type_traits.h"
 #include "arrow/util/checked_cast.h"
+#include "arrow/util/float16.h"

Review Comment:
   `tensor.cc` still references `RecordBatch` APIs (e.g. `RecordBatchToTensor`, 
`RecordBatch`-specific accessors in templates), but the `#include 
\"arrow/record_batch.h\"` include was removed in this change. Unless 
`arrow/tensor.h` (or another included header) guarantees the full `RecordBatch` 
definition, this can cause fragile build dependencies or compilation failures. 
Prefer including `arrow/record_batch.h` explicitly here alongside 
`arrow/table.h`.



##########
cpp/src/arrow/tensor.cc:
##########


Review Comment:
   The new `if (!col_type->Equals(result_field->type()))` guard can prevent 
`Field::MergeWith(..., options)` from running when column types are identical. 
That changes behavior for `null_to_nan=true` because 
`options.promote_integer_to_float` is applied via the merge, so identical 
integer columns (or single-column integer inputs) can fail to promote to float 
and therefore cannot represent NaNs correctly. Fix by ensuring the promotion 
logic still runs when `null_to_nan` is enabled (e.g., always merge when 
`null_to_nan && is_integer(result_field->type()->id())`, or add an explicit 
promotion step independent of merges).



##########
cpp/src/arrow/tensor.cc:
##########
@@ -256,26 +263,36 @@ struct ConvertColumnsToTensorVisitor {
 };
 
 template <typename Out>
-struct ConvertColumnsToTensorRowMajorVisitor {
+struct ConvertArrayToTensorRowMajorVisitor {
   Out*& out_values;
   const ArrayData& in_data;
-  int num_cols;
-  int col_idx;
+  int64_t num_cols;
+  int64_t col_idx;
+  int64_t chunk_idx;

Review Comment:
   `chunk_idx` is used as a running row offset (it is incremented by 
`chunk->length()`), not a chunk index. Renaming it to something like 
`row_offset` (or similar) would make the row-major indexing math easier to 
follow and reduce the chance of future off-by-one/offset mistakes.



##########
cpp/src/arrow/tensor.h:
##########
@@ -77,6 +77,10 @@ Status ValidateTensorParameters(const 
std::shared_ptr<DataType>& type,
                                 const std::vector<int64_t>& strides,
                                 const std::vector<std::string>& dim_names);
 
+ARROW_EXPORT
+Status TableToTensor(const Table& table, bool null_to_nan, bool row_major,
+                     MemoryPool* pool, std::shared_ptr<Tensor>* tensor);

Review Comment:
   `TableToTensor` introduces `Table` in the `arrow/tensor.h` interface. Ensure 
`Table` is declared before use in this header (either by including the 
appropriate forward declaration/header). Without that, `arrow/tensor.h` can 
become order-dependent for consumers that previously didn't need 
`arrow/table.h`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to