trxcllnt commented on code in PR #378:
URL: https://github.com/apache/arrow-js/pull/378#discussion_r2875357204


##########
src/ipc/message.ts:
##########
@@ -29,6 +29,20 @@ import { ArrowJSON, ArrowJSONLike, ITERATOR_DONE, FileHandle 
} from '../io/inter
 /** @ignore */ const invalidMessageMetadata = (expected: number, actual: 
number) => `Expected to read ${expected} metadata bytes, but only read 
${actual}.`;
 /** @ignore */ const invalidMessageBodyLength = (expected: number, actual: 
number) => `Expected to read ${expected} bytes for message body, but only read 
${actual}.`;
 
+/**
+ * Maximum allowed metadata length (256 MB). This is a safeguard against 
corrupted
+ * files that could cause the reader to hang or consume excessive memory.
+ * @ignore
+ */
+const MAX_METADATA_LENGTH = 256 * 1024 * 1024;
+
+/**
+ * Maximum allowed message body length (2 GB). This is a safeguard against 
corrupted
+ * files that could cause the reader to hang or consume excessive memory.
+ * @ignore
+ */
+const MAX_BODY_LENGTH = 2 * 1024 * 1024 * 1024;
+

Review Comment:
   We can't add a timeout in Arrow since we're just processing a stream the 
user gives us. We don't have enough context to know why a read call might take 
a certain amount of time. Is the file on a slow NFS drive? Is it a `fetch()` to 
a dynamic resource and the the next batch hasn't been computed yet? Is the 
user's connection just slow?
   
   That said, you can add a timeout before/after Arrow's transform stream to 
either terminate/cancel based on your users' expected behavior.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to