lostluck commented on issue #30703: URL: https://github.com/apache/beam/issues/30703#issuecomment-2234368826
I had made a mistake earlier when I first looked at this. Here's how it's all going down: 1. Prism does produce the *job log* messages API from Job management, allowing Job Messages to be sent back to the launching process if handled synchronously. 2. Prism hosts and *receives* Worker Logs, through the Beam FnAPI Logging service which is what is being printed out there. But this currently just does a prism side log of the message. https://github.com/apache/beam/blob/f3e6c66c0a5d3a8638fd94978adf503be5081274/sdks/go/pkg/beam/runners/prism/internal/worker/worker.go#L212 3. When launching a single Go SDK pipeline binary, (as in your "stand alone binary" ) the prism process is in the same process entirely, making those logs visible in the launching command line. 4. When running the pipeline against a stand alone prism binary, instead the logs would appear wherever the prism runner has been launched. 5. There's no current Beam API for accessing these *worker logs* which are ultimately as determined by the runner (eg. Sent to Cloud Logging if running on Dataflow). Running against a stand alone prism runner is technically the "common case" we're moving towards, since that affects support for the Java and Python SDKs (and future SDKs). I know that ultimately I'd like Prism to actually write such logs to files, in some default directory and similar, and be able to surface them in the UI to some degree. Eg. Select a ParDo and get the related log information displayed in the browser. This doesn't necessarily help the command line case, short of also printing out information to the prism terminal command line like "job data and logs are found at <path>". That requires moving forward on not keeping everything in memory though. I can see two options, which could require doing both. #### Flags I do need to see what's possible for the current Java and Python SDKs, but for the current Go SDK at least, we could add a job configuration option to route worker logs to the Job Log messages. This could be set as a flag stand alone prism flag for the process's default, and overridden on a per job level via Pipeline Options. That would then allow SDKs that are downloading and starting prism by default to set that configuration since it would support most test uses, and selectively by the user on either the pipeline launch, or the prism launch. This helps auto-solve debugging running in Docker containers too. #### Disable Logging in Loopback, have SDKs default to StdOut in that case. The other option is to simply *not* send a logging address to the SDK in LOOPBACK mode. I believe the SDKs do fallback to StdOut and StdErr output for logs if there's no logging connection, but that's only for failure. This would avoid round trips to the prism process, and rely on the SDK fallback options. https://github.com/apache/beam/blob/f3e6c66c0a5d3a8638fd94978adf503be5081274/sdks/go/pkg/beam/core/runtime/harness/logging.go#L133 Basically we'd just check if there's actually a logging endpoint here, and if it's empty, we quietly write logs to StdOut instead. Likely we would need to do something similar for Java and Python though. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
