Hey everyone, I put together a design doc on adding configurable timeouts to the RunInference transform (similar to what we have using with_exception_handling today). This is not trivial because some additional cleanup needs to be done to clean up any hanging model state (which may be in an entire different process from the DoFn); I'm proposing a mechanism for doing this cleanup as part of the timeout logic, while exposing it in the same way we expose with_exception_handling timeouts today.
The doc is here - https://docs.google.com/document/d/19ves6iv-m_6DFmePJZqYpLm-bCooPu6wQ-Ti6kAl2Jo/edit?usp=sharing - please let me know if you have any comments! Thanks, Danny