pitrou commented on code in PR #41117:
URL: https://github.com/apache/arrow/pull/41117#discussion_r1559177652


##########
cpp/src/arrow/io/compressed.cc:
##########
@@ -405,13 +412,22 @@ class CompressedInputStream::Impl {
     ARROW_ASSIGN_OR_RAISE(auto buf, AllocateResizableBuffer(nbytes, pool_));
     ARROW_ASSIGN_OR_RAISE(int64_t bytes_read, Read(nbytes, 
buf->mutable_data()));
     RETURN_NOT_OK(buf->Resize(bytes_read));
-    // Using std::move because the some compiler might has issue below:
+    // Using std::move because some compiler might has issue below:
     // https://wg21.cmeerw.net/cwg/issue1579
     return std::move(buf);
   }
 
   const std::shared_ptr<InputStream>& raw() const { return raw_; }
 
+ private:
+  int64_t CompressedBufferAvailable() const {

Review Comment:
   Can we name this `available_compressed_data`?



##########
cpp/src/arrow/io/compressed.cc:
##########
@@ -353,7 +355,10 @@ class CompressedInputStream::Impl {
 
   // Try to feed more data into the decompressed_ buffer.
   Status RefillDecompressed(bool* has_data) {
-    // First try to read data from the decompressor
+    // First try to read data from the decompressor.
+    // This doesn't use `CompressedBufferAvailable()` because when compressed_
+    // exists and available == 0, it might trigger  an empty decompress and set

Review Comment:
   It's not necessarily an empty decompress if the decompressor has its own 
internal buffer?



##########
cpp/src/arrow/io/compressed.cc:
##########
@@ -405,13 +412,22 @@ class CompressedInputStream::Impl {
     ARROW_ASSIGN_OR_RAISE(auto buf, AllocateResizableBuffer(nbytes, pool_));
     ARROW_ASSIGN_OR_RAISE(int64_t bytes_read, Read(nbytes, 
buf->mutable_data()));
     RETURN_NOT_OK(buf->Resize(bytes_read));
-    // Using std::move because the some compiler might has issue below:
+    // Using std::move because some compiler might has issue below:
     // https://wg21.cmeerw.net/cwg/issue1579
     return std::move(buf);
   }
 
   const std::shared_ptr<InputStream>& raw() const { return raw_; }
 
+ private:
+  int64_t CompressedBufferAvailable() const {
+    return compressed_ ? compressed_->size() - compressed_pos_ : 0;
+  }
+
+  int64_t DecompressedBufferAvailable() const {

Review Comment:
   Similarly, `available_decompressed_data`



##########
cpp/src/arrow/io/compressed.cc:
##########
@@ -405,13 +412,22 @@ class CompressedInputStream::Impl {
     ARROW_ASSIGN_OR_RAISE(auto buf, AllocateResizableBuffer(nbytes, pool_));
     ARROW_ASSIGN_OR_RAISE(int64_t bytes_read, Read(nbytes, 
buf->mutable_data()));
     RETURN_NOT_OK(buf->Resize(bytes_read));
-    // Using std::move because the some compiler might has issue below:
+    // Using std::move because some compiler might has issue below:

Review Comment:
   Can we simply try to remove the `std::move`? The linked issue seems very old.



##########
cpp/src/arrow/io/compressed.cc:
##########
@@ -353,7 +355,10 @@ class CompressedInputStream::Impl {
 
   // Try to feed more data into the decompressed_ buffer.
   Status RefillDecompressed(bool* has_data) {
-    // First try to read data from the decompressor
+    // First try to read data from the decompressor.
+    // This doesn't use `CompressedBufferAvailable()` because when compressed_
+    // exists and available == 0, it might trigger  an empty decompress and set

Review Comment:
   Trivially, if the decompressed data is 1MB of zeros, and we decompress 
`kDecompressSize` bytes at a time, `DecompressData` will still yield data even 
after the compressed data is finished.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to