kunwp1 opened a new issue, #3802:
URL: https://github.com/apache/texera/issues/3802

   # Problem
   
   Texera currently streams console messages from the backend to the frontend 
during workflow execution. These messages may include:
   - Standard print() outputs from operators.
   - Debug messages generated by the Python UDF operator debugger.
   
   The problem is that console messages can be extremely large, both in terms 
of individual message length and total number of messages. This may cause a 
frontend out-of-memory issue, since browsers typically have limited memory of a 
few gigabytes.
   
   # Related Work
   
   [PR #3346](https://github.com/apache/texera/pull/3346)
   : Mitigates memory issues by truncating individual console messages that 
exceed a length threshold, and showing only the most recent messages 
(discarding earlier ones).
   
   [PR #3786](https://github.com/apache/texera/pull/3786)
   : Prevents truncation of debug messages, since full logs are necessary for 
user debugging.
   
   However, these two approaches conflict:
   - Users need full visibility of debug messages, but
   - The system must also protect the frontend from unbounded memory growth.
   
   Currently, there is no unified mechanism that satisfies both requirements.
   
   # Design
   We need a design that balances user visibility with memory safety. Two 
possible approaches have been identified:
   1. On-Demand Retrieval (Lazy Loading)
      - Store complete console messages on the backend.
      - Initially send truncated versions to the frontend.
      - Allow users to expand/traverse messages by sending a follow-up request 
to fetch the full content.
      - Pros: Memory-safe + Users can still access all content.
      - Cons: Requires additional backend–frontend request/response logic and 
UI complexity.
   
   2. Global Memory Budget with Selective Truncation
      - Do not truncate by default. Instead, maintain a global memory budget 
(e.g., 100 MB of console logs in the browser).
      - Once the budget is exceeded, evict or truncate older messages while 
retaining newer ones.
      - For rare cases where a single message exceeds the memory threshold 
(e.g., a print() of a massive object), truncate that individual message.
      - Debug messages are assumed not to have this rare case. If users produce 
unusually large single messages, the responsibility shifts to them to shorten 
outputs.
      - Pros: Simpler to implement (avoids extra backend requests) + Aligns 
with user needs for complete debug logs.
      - Cons: Some data loss possible if a single massive message is truncated.
   
   # Preferred Solution
   Adopt Design 2
   - Simpler and more maintainable.
   - Ensures users can generally view full debug messages.
   - Protects the frontend against out-of-memory crashes.
   - Explicitly handles the edge case of oversized single messages with 
graceful truncation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to