paleolimbot commented on code in PR #49771:
URL: https://github.com/apache/arrow/pull/49771#discussion_r3203177106
##########
cpp/src/arrow/ipc/message.cc:
##########
@@ -565,6 +565,17 @@ Status DecodeMessage(MessageDecoder* decoder,
io::InputStream* file) {
auto metadata_length = decoder->next_required_size();
ARROW_ASSIGN_OR_RAISE(auto metadata, file->Read(metadata_length));
if (metadata->size() != metadata_length) {
+ // The first sizeof(int32_t) bytes of the Arrow file magic ("ARRO") may
have been
+ // misread as metadata_length. Check if the remaining bytes complete the
magic.
Review Comment:
In nanoarrow we check the first few bytes for the magic string and skip them
(then attempt to read the rest of the input as an IPC stream). We've never run
into a complaint about this not working but I'm not sure how widespread the
usage is (we could add an option to turn it off or improve the error that
occurs if we run into one). I think 1330795073 bytes of metadata would never
reasonably occur on purpose.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]