kunwp1 opened a new issue, #3802: URL: https://github.com/apache/texera/issues/3802
# Problem Texera currently streams console messages from the backend to the frontend during workflow execution. These messages may include: - Standard print() outputs from operators. - Debug messages generated by the Python UDF operator debugger. The problem is that console messages can be extremely large, both in terms of individual message length and total number of messages. This may cause a frontend out-of-memory issue, since browsers typically have limited memory of a few gigabytes. # Related Work [PR #3346](https://github.com/apache/texera/pull/3346) : Mitigates memory issues by truncating individual console messages that exceed a length threshold, and showing only the most recent messages (discarding earlier ones). [PR #3786](https://github.com/apache/texera/pull/3786) : Prevents truncation of debug messages, since full logs are necessary for user debugging. However, these two approaches conflict: - Users need full visibility of debug messages, but - The system must also protect the frontend from unbounded memory growth. Currently, there is no unified mechanism that satisfies both requirements. # Design We need a design that balances user visibility with memory safety. Two possible approaches have been identified: 1. On-Demand Retrieval (Lazy Loading) - Store complete console messages on the backend. - Initially send truncated versions to the frontend. - Allow users to expand/traverse messages by sending a follow-up request to fetch the full content. - Pros: Memory-safe + Users can still access all content. - Cons: Requires additional backend–frontend request/response logic and UI complexity. 2. Global Memory Budget with Selective Truncation - Do not truncate by default. Instead, maintain a global memory budget (e.g., 100 MB of console logs in the browser). - Once the budget is exceeded, evict or truncate older messages while retaining newer ones. - For rare cases where a single message exceeds the memory threshold (e.g., a print() of a massive object), truncate that individual message. - Debug messages are assumed not to have this rare case. If users produce unusually large single messages, the responsibility shifts to them to shorten outputs. - Pros: Simpler to implement (avoids extra backend requests) + Aligns with user needs for complete debug logs. - Cons: Some data loss possible if a single massive message is truncated. # Preferred Solution Adopt Design 2 - Simpler and more maintainable. - Ensures users can generally view full debug messages. - Protects the frontend against out-of-memory crashes. - Explicitly handles the edge case of oversized single messages with graceful truncation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
