anjakefala commented on PR #40565:
URL: https://github.com/apache/arrow/pull/40565#issuecomment-2008319524
@pitrou Yes, indeed it does!
I looked at my original code that was resulting in freezing at the
`RETURN_NOT_OK(memo_table.GetOrInsert(value, &memo_index));` line.
```
18 inline Status ConvertAsPyObjects(const PandasOptions& options, const
ChunkedArray& data,
619 WrapFunction&& wrap_func, PyObject**
out_values) {
620 using ArrayType = typename TypeTraits<Type>::ArrayType;
621 using Scalar = typename MemoizationTraits<Type>::Scalar;
622 std::function<Status(const typename
MemoizationTraits<Type>::Scalar&, PyObject**)> WrapFunc;
623
624 if (options.deduplicate_objects) {
626 ::arrow::internal::ScalarMemoTable<Scalar>
memo_table(options.pool);
627 std::vector<PyObject*> unique_values;
628 int32_t memo_size = 0;
630
631 WrapFunc = [&](const Scalar& value, PyObject** out_values) {
633 int32_t memo_index;
634 RETURN_NOT_OK(memo_table.GetOrInsert(value, &memo_index));
636 if (memo_index == memo_size) {
637 // New entry
639 RETURN_NOT_OK(wrap_func(value, out_values));
640 unique_values.push_back(*out_values);
641 ++memo_size;
642 } else {
644 // Duplicate entry
645 Py_INCREF(unique_values[memo_index]);
646 *out_values = unique_values[memo_index];
647 }
648 return Status::OK();
649 };
650 } else {
651 WrapFunc = [&](const Scalar& value, PyObject** out_values) {
652 return wrap_func(value, out_values);
653 };
654 }
655
656 for (int c = 0; c < data.num_chunks(); c++) {
658 const auto& arr = arrow::internal::checked_cast<const
ArrayType&>(*data.chunk(c));
660 RETURN_NOT_OK(internal::WriteArrayObjects(arr, WrapFunc,
out_values));
662 out_values += arr.length();
663 }
664 return Status::OK();
665 }
```
It seems that the main difference is your creation of `convert_chunks`, and
moving the for loop within it. Do you know why that helped fix the freezing?
Anyway, this works great for my needs! Thank you!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]