amol- commented on issue #43892:
URL: https://github.com/apache/arrow/issues/43892#issuecomment-2354183047
It seems it might have something to do with the Read operation not getting
properly aborted.
The `TransformInputStream::Read` method, doesn't do anything special to
handle the case where the transformer has failed. So it isn't immediately
obvious where the read operation would get aborted in such case.
The following patch seemed to fix the issue for me
```
diff --git a/cpp/src/arrow/io/transform.cc b/cpp/src/arrow/io/transform.cc
index 3fdf5a7a9..a8c40ee53 100644
--- a/cpp/src/arrow/io/transform.cc
+++ b/cpp/src/arrow/io/transform.cc
@@ -102,7 +102,11 @@ Result<int64_t> TransformInputStream::Read(int64_t
nbytes, void* out) {
const bool have_eof = (buf->size() == 0);
// Even if EOF is met, let the transform function run a last time
// (for example to flush internal buffers)
- ARROW_ASSIGN_OR_RAISE(buf, impl_->transform_(std::move(buf)));
+ auto transform_status = impl_->transform_(std::move(buf));
+ if (!transform_status.ok()) {
+ RETURN_NOT_OK(this->Abort());
+ RETURN_NOT_OK(transform_status);
+ }
avail_size += buf->size();
avail.push_back(std::move(buf));
if (have_eof) {
```
but someone who is more confident with the IO part of the codebase might
have to check this more in detail
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]