alamb commented on code in PR #9869:
URL: https://github.com/apache/arrow-rs/pull/9869#discussion_r3348765569


##########
arrow-ipc/src/reader.rs:
##########
@@ -1794,6 +1794,16 @@ pub(crate) enum IpcMessage {
     },
 }
 
+/// Maximum bytes of IPC message metadata we will allocate up front from an
+/// untrusted header. Real Arrow IPC metadata is a FlatBuffer schema descriptor

Review Comment:
   How do we know that real IPC metadata is under 16MB? This seems like it 
could error on valid inputs



##########
arrow-ipc/src/reader.rs:
##########
@@ -1794,6 +1794,16 @@ pub(crate) enum IpcMessage {
     },
 }
 
+/// Maximum bytes of IPC message metadata we will allocate up front from an
+/// untrusted header. Real Arrow IPC metadata is a FlatBuffer schema descriptor
+/// — well under 1 MiB even for very wide tables — so this is a generous cap.
+const MAX_META_LEN: usize = 16 * 1024 * 1024;
+
+/// Maximum bytes of IPC message body we will allocate up front from an
+/// untrusted header. Single Arrow record batches above 2 GiB are rare in
+/// practice and would push past `usize`-on-32-bit anyway.

Review Comment:
   Even if they are rare in practice, that doesn't mean they are invalid. Won't 
this cap reject valid messages?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to