lidavidm commented on a change in pull request #10663:
URL: https://github.com/apache/arrow/pull/10663#discussion_r670405214



##########
File path: cpp/src/arrow/flight/serialization_internal.cc
##########
@@ -164,26 +164,18 @@ static const uint8_t kPaddingBytes[8] = {0, 0, 0, 0, 0, 
0, 0, 0};
 
 // Update the sizes of our Protobuf fields based on the given IPC payload.
 grpc::Status IpcMessageHeaderSize(const arrow::ipc::IpcPayload& ipc_msg, bool 
has_body,
-                                  size_t* body_size, size_t* header_size,
-                                  int32_t* metadata_size) {
-  DCHECK_LT(ipc_msg.metadata->size(), kInt32Max);
+                                  size_t* header_size, int32_t* metadata_size) 
{
+  DCHECK_LE(ipc_msg.metadata->size(), kInt32Max);
   *metadata_size = static_cast<int32_t>(ipc_msg.metadata->size());
 
   // 1 byte for metadata tag
   *header_size += 1 + WireFormatLite::LengthDelimitedSize(*metadata_size);
 
-  for (const auto& buffer : ipc_msg.body_buffers) {
-    // Buffer may be null when the row length is zero, or when all
-    // entries are invalid.
-    if (!buffer) continue;
-
-    *body_size += 
static_cast<size_t>(BitUtil::RoundUpToMultipleOf8(buffer->size()));
-  }
-
   // 2 bytes for body tag
   if (has_body) {
     // We write the body tag in the header but not the actual body data
-    *header_size += 2 + WireFormatLite::LengthDelimitedSize(*body_size) - 
*body_size;
+    *header_size += 2 + 
WireFormatLite::LengthDelimitedSize(ipc_msg.body_length) -

Review comment:
       If you constructed your own IPC payload, I suppose that could be the 
case. But in general you'd use one of the GetFooPayload functions in 
arrow/ipc/writer.h, which precalculate the body length for you. (And it is 
valid to have a zero-length payload in general, since it might be metadata-only 
or IPC-header-only. The previous calculation here was solely to guard against a 
too-large payload, but the only thing we can do at this point is crash anyways.)

##########
File path: cpp/src/arrow/flight/types.cc
##########
@@ -21,6 +21,7 @@
 #include <sstream>
 #include <utility>
 
+#include "arrow/buffer.h"

Review comment:
       It's actually needed for the descriptor->size() call below.

##########
File path: cpp/src/arrow/flight/test_util.cc
##########
@@ -616,6 +640,22 @@ Status ExampleLargeBatches(BatchVector* out) {
   return Status::OK();
 }
 
+arrow::Result<std::shared_ptr<RecordBatch>> VeryLargeBatch() {
+  // In CI, some platforms don't let us allocate one very large
+  // buffer, so allocate a smaller buffer and repeat it a few times
+  constexpr int64_t nbytes = (1ul << 27ul) + 8ul;
+  constexpr int64_t nrows = nbytes / 8;
+  constexpr int64_t ncols = 16;
+  ARROW_ASSIGN_OR_RAISE(auto values, AllocateBuffer(nbytes));
+  std::memset(values->mutable_data(), 0x00, values->capacity());
+  std::vector<std::shared_ptr<Buffer>> buffers = {nullptr, std::move(values)};
+  auto array = std::make_shared<ArrayData>(int64(), nrows, buffers,
+                                           /*null_count=*/0);
+  std::vector<std::shared_ptr<ArrayData>> arrays(ncols, array);
+  std::vector<std::shared_ptr<Field>> fields(ncols, field("a", int64()));
+  return RecordBatch::Make(schema(std::move(fields)), nrows, 
std::move(arrays));

Review comment:
       Yes, and when written out, we'll write all 16 columns to the stream. 
(This was done just to save memory and try to get the test runnable on most of 
the CI environments.)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to