lostluck commented on issue #33623:
URL: https://github.com/apache/beam/issues/33623#issuecomment-2802309243

   The only trick with the singleton approach without an occasional idle 
timeout would be that Prism isn't yet set up for indefinite running. So a heads 
up on OOMs if users keep an instance around long term, which may happen 
depending on how many iterations someone does in Colab.
   
   1. Artifacts are kept in memory indefinitely: 
https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/runners/prism/internal/jobservices/server.go#L63
   2. In principle job metadata/pipeline Protos kept in memory indefinitely 
https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/runners/prism/internal/jobservices/server.go#L48
 
   
   I have a suggested immediate resolution, and one longer term:
   1. Recognize that when the UI isn't turned on, we likely only need to keep 
active jobs in memory.
   2. Offload some of the "archive" stuff out of memory into a persistent 
storage location if configured.
   
    The "default" mode for prism is really the SDKs spinning itself up, 
lingering until the SDK process is done with it. Artifacts can be GC'd right 
away once the job is done, and metrics can probably stick around indefinitely 
to some limit on the number of cached jobs stats.
   
   That's also complimentary to the long term set up, but a long term job is a 
priority, and doing it right/completely is a much larger task. (eg things like: 
also putting logs in the durable storage, putting durable intermediates there 
for larger jobs, restart in progress jobs/job update etc).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to