Copilot commented on code in PR #49771:
URL: https://github.com/apache/arrow/pull/49771#discussion_r3107520227
##########
cpp/src/arrow/ipc/reader.cc:
##########
@@ -1890,7 +1890,9 @@ class RecordBatchFileReaderImpl : public
RecordBatchFileReader {
const auto magic_start = buffer->data() + sizeof(int32_t);
if (std::string_view(reinterpret_cast<const char*>(magic_start),
kMagicSize) !=
kArrowMagicBytes) {
- return Status::Invalid("Not an Arrow file");
+ return Status::Invalid(
+ "Not an Arrow file. If this is an Arrow IPC Streaming format
file, try "
+ "open_stream() instead.");
Review Comment:
The error message suggests calling `open_stream()`, but this is not a C++
API entry point (C++ uses `RecordBatchStreamReader::Open` / stream reader
types). Since this Status message is emitted from the C++ library and may be
surfaced to C++ callers as well as bindings, consider making the guidance
API-neutral (e.g., refer to the IPC stream reader) or mention the C++ API name
explicitly.
```suggestion
"Not an Arrow file. If this is an Arrow IPC streaming format
file, use "
"the IPC stream reader instead.");
```
##########
cpp/src/arrow/ipc/message.cc:
##########
@@ -565,6 +565,17 @@ Status DecodeMessage(MessageDecoder* decoder,
io::InputStream* file) {
auto metadata_length = decoder->next_required_size();
ARROW_ASSIGN_OR_RAISE(auto metadata, file->Read(metadata_length));
if (metadata->size() != metadata_length) {
+ // The first sizeof(int32_t) bytes of the Arrow file magic ("ARRO") may
have been
+ // misread as metadata_length. Check if the remaining bytes complete the
magic.
+ const auto remaining_magic =
internal::kArrowMagicBytes.substr(sizeof(int32_t));
+ if (metadata->size() >= static_cast<int64_t>(remaining_magic.size()) &&
+ std::string_view(reinterpret_cast<const char*>(metadata->data()),
+ remaining_magic.size()) == remaining_magic) {
+ return Status::Invalid("Expected to read ", metadata_length,
+ " metadata bytes, but only read ",
metadata->size(),
+ ". This appears to be an Arrow IPC File format
file. "
+ "Try open_file() instead of open_stream().");
Review Comment:
This error text is emitted from the C++ IPC decoder but recommends
`open_file()` / `open_stream()`, which are binding-specific names (they don't
exist in the C++ API). Consider API-neutral guidance (IPC file reader vs IPC
stream reader) or include the C++ entry points (e.g.,
`RecordBatchFileReader::Open` / `RecordBatchStreamReader::Open`). Also, "Arrow
IPC File format file" repeats "file"; reword to avoid the redundancy.
```suggestion
". This appears to be an Arrow IPC file. "
"Try the IPC file reader instead of the IPC
stream reader.");
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]