RobertLD commented on code in PR #49771:
URL: https://github.com/apache/arrow/pull/49771#discussion_r3416218431
##########
cpp/src/arrow/ipc/message.cc:
##########
@@ -563,7 +563,25 @@ Status DecodeMessage(MessageDecoder* decoder,
io::InputStream* file) {
}
auto metadata_length = decoder->next_required_size();
+
+ // "ARRO" (first 4 bytes of kArrowMagicBytes) as little-endian int32.
+ constexpr int32_t kArrowMagicPrefix = 0x4F525241;
+
+ // Did we misinterpret the metadata as a length?
+ if (metadata_length == kArrowMagicPrefix) {
+ constexpr std::string_view kRemainingMagic =
+ internal::kArrowMagicBytes.substr(sizeof(int32_t));
+ ARROW_ASSIGN_OR_RAISE(auto peek, file->Read(kRemainingMagic.size()));
+ if (peek->size() >= static_cast<int64_t>(kRemainingMagic.size()) &&
+ std::string_view(reinterpret_cast<const char*>(peek->data()),
+ kRemainingMagic.size()) == kRemainingMagic) {
Review Comment:
@pitrou Okay I see what you're saying and this was an oversight on my part.
My idea is to continue to let it fall through and add a separate check
immediately below it to reject metadata over 1GB. I think this is the best
solution but it does change the behavior of the library (and I'm concerned
there may be a single user somewhere who has 10Gb of metadata) and it's a bit
of stretch scope wise considering this ticket is about error messages.
What are your thoughts? I'm thinking I just go forward with the above (or
alternatively, just throw a failure with a different message on specifically
0x4F525241 even if it's not IPC)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]