lostluck commented on issue #30703:
URL: https://github.com/apache/beam/issues/30703#issuecomment-2234368826

   I had made a mistake earlier when I first looked at this. Here's how it's 
all going down:
   
   1. Prism does produce the *job log* messages API from Job management, 
allowing Job Messages to be sent back to the launching process if handled 
synchronously. 
   2. Prism hosts and *receives* Worker Logs, through the Beam FnAPI Logging 
service which is what is being printed out there. But this currently just does 
a prism side log of the message.
   
   
https://github.com/apache/beam/blob/f3e6c66c0a5d3a8638fd94978adf503be5081274/sdks/go/pkg/beam/runners/prism/internal/worker/worker.go#L212
   
   3. When launching a single Go SDK pipeline binary, (as in your "stand alone 
binary" ) the prism process is in the same process entirely, making those logs 
visible in the launching command line.
   
   4. When running the pipeline against a stand alone prism binary, instead the 
logs would appear wherever the prism runner has been launched.
   
   5. There's no current Beam API for accessing these *worker logs* which are 
ultimately as determined by the runner (eg. Sent to Cloud Logging if running on 
Dataflow).
   
   Running against a stand alone prism runner is technically the "common case" 
we're moving towards, since that affects support for the Java and Python SDKs 
(and future SDKs). 
   
   I know that ultimately I'd like Prism to actually write such logs to files, 
in some default directory and similar, and be able to surface them in the UI to 
some degree. Eg. Select a ParDo and get the related log information displayed 
in the browser. This doesn't necessarily help the command line case, short of 
also printing out information to the prism terminal command line like "job data 
and logs are found at <path>". That requires moving forward on not keeping 
everything in memory though.
   
   I can see two options, which could require doing both.
   
   #### Flags
   
   I do need to see what's possible for the current Java and Python SDKs, but 
for the current Go SDK at least, we could add a job configuration option to 
route worker logs to the Job Log messages. This could be set as a flag stand 
alone prism flag for the process's default, and overridden on a per job level 
via Pipeline Options.
   
   That would then allow SDKs that are downloading and starting prism by 
default to set that configuration since it would support most test uses, and 
selectively by the user on either the pipeline launch, or the prism launch.
   
   This helps auto-solve debugging running in Docker containers too.
   
   #### Disable Logging in Loopback, have SDKs default to StdOut in that case.
   
   The other option is to simply *not* send a logging address to the SDK in 
LOOPBACK mode. I believe the SDKs do fallback to StdOut and StdErr output for 
logs if there's no logging connection, but that's only for failure. This would 
avoid round trips to the prism process, and rely on the SDK fallback options.
   
   
https://github.com/apache/beam/blob/f3e6c66c0a5d3a8638fd94978adf503be5081274/sdks/go/pkg/beam/core/runtime/harness/logging.go#L133
   
   Basically we'd just check if there's actually a logging endpoint here, and 
if it's empty, we quietly write logs to StdOut instead.  Likely we would need 
to do something similar for Java and Python though.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to