Rethinking the architecture to more fully exploit the capabilities of the underlying container orchestration platforms is pretty exciting. I think there are lots of interesting ideas to explore about how best to schedule the workload.
As brought out in the architecture proposal [1], although it is logically an orthogonal issue, improving the log processing for user containers is a key piece of this roadmap. The initial experiences with the KubernetesContainerFactory indicate that post-facto log enrichment to add the activation id to each log line is a serious bottleneck. It adds complexity to the system and measurably reduces system performance by delaying the re-use of action containers until the logs can be extracted and processing. I believe what we really want is to be using an openwhisk-aware log driver that will dynamically inject the current activation id into every log line as soon as it is written. Then the user container logs, already properly enriched when they are generated, can be fed directly into the platform logging system with no post-processing needed. If the low-level container runtime is docker 17.09 or better, I think we could probably achieve this by writing a logging driver plugin [2] that extends docker's json logging driver. For non-blackbox containers, I think we "just" need the /run method to update a shared location that is accessible to the logging driver plugin with the current activation id before it invokes the user code. As log lines are produced, that location is read and the string with the activation id gets injected into the json formatted log line as it is produced. For blackbox containers, we could have our dockerskeleton do the same thing, but the user would have to opt in somehow to the protocol if they were using their own action runner. Warning: I haven't looked into how flushing works with these drivers, so I'm not sure that this really works....we need to make sure we don't enrich a log line with the wrong activation id because of delayed flushing. If we're running on Kubernetes, we might decide that instead of using a logging driver plugin, to use a streaming sidecar container as shown in [3] and have the controller interact with the sidecar to update the current activation id (or have the sidecar read it from a shared memory location that is updated by /run to minimize the differences between deployment platforms). I'm not sure this really works as well, since the sidecar might fall behind in processing the logs, so we might still need a handshake somewhere. A third option would be to extend our current sentineled log design by also writing a "START_WHISK_ACTIVATION_LOG <ACTIVATION_ID>" line in the /run method before invoking the user code. We'd still have to post-process the log files, but it could be decoupled from the critical path since the post-processor would have the activation id available to it in the log files (and thus would not need to handshake with the controller at all, thus we could offload all logging to a node-level log processing/forwarding agent). Option 3 would be really easy to implement and is independent of the details of the low-level log driver, but doesn't eliminate the need to post-process the logs. It just makes it easier to move that processing off any critical path. Thoughts? --dave [1] https://cwiki.apache.org/confluence/display/OPENWHISK/OpenWhisk+future +architecture [2] https://docs.docker.com/v17.09/engine/admin/logging/plugins/ [3] https://kubernetes.io/docs/concepts/cluster-administration/logging/