On Logging, I think if you are considering enabling concurrent activation 
processing, you will encounter that the only approach to parsing logs to be 
associated with a specific activationId, is to force the log output to be 
structured, and always include the activationId with every log message. This 
requires a change at the action container layer, but the simpler thing to do is 
to encourage action containers to provide a structured logging context that 
action developers can (and must) use to generate logs. 

An example is nodejs container - for the time being, we are hijacking the 
stdout/stderr and injecting the activationId when any developer code writes to 
stdout/stderr (as console.log/console.error). This may not work as simply in 
all action containers, and isn’t great even in nodejs. 

I would rather encourage action containers to provide a logging context, where 
action devs use: log.info, log.debug, etc, and this logging context does the 
needful to assert some structure to the log format. In general, many (most?) 
languages have conventions (slf4xyz, et al) for this already, and while you 
lose “random writes to stdout”, I haven’t seen this be an actual problem. 

If you don’t deal with interleaved logs (typically because activations don’t 
run concurrently), this this is less of an issue, but regardless, writing log 
parsers is a solved problem that would still be good to offload to external 
(not in OW controller/invoker) systems (logstash, fluentd, splunk, etc). This 
obviously comes with a caveat that logs parsing will be delayed, but that is OK 
from my point of view, partly because most logs will never be viewed, and 
partly because the log ingest systems are mostly fast enough already to limit 
this delay to seconds or milliseconds.  

Thanks
Tyson
> On Jul 20, 2018, at 8:46 AM, David P Grove <gro...@us.ibm.com> wrote:
> 
> 
> Rethinking the architecture to more fully exploit the capabilities of the
> underlying container orchestration platforms is pretty exciting.  I think
> there are lots of interesting ideas to explore about how best to schedule
> the workload.
> 
> As brought out in the architecture proposal [1], although it is logically
> an orthogonal issue, improving the log processing for user containers is a
> key piece of this roadmap.  The initial experiences with the
> KubernetesContainerFactory indicate that post-facto log enrichment to add
> the activation id to each log line is a serious bottleneck.  It adds
> complexity to the system and measurably reduces system performance by
> delaying the re-use of action containers until the logs can be extracted
> and processing.
> 
> I believe what we really want is to be using an openwhisk-aware log driver
> that will dynamically inject the current activation id into every log line
> as soon as it is written.  Then the user container logs, already properly
> enriched when they are generated, can be fed directly into the platform
> logging system with no post-processing needed.
> 
> If the low-level container runtime is docker 17.09 or better, I think we
> could probably achieve this by writing a logging driver plugin [2] that
> extends docker's json logging driver.  For non-blackbox containers, I think
> we "just" need the /run method to update a shared location that is
> accessible to the logging driver plugin with the current activation id
> before it invokes the user code.  As log lines are produced, that location
> is read and the string with the activation id gets injected into the json
> formatted log line as it is produced.   For blackbox containers, we could
> have our dockerskeleton do the same thing, but the user would have to opt
> in somehow to the protocol if they were using their own action runner.
> Warning:  I haven't looked into how flushing works with these drivers, so
> I'm not sure that this really works....we need to make sure we don't enrich
> a log line with the wrong activation id because of delayed flushing.
> 
> If we're running on Kubernetes, we might decide that instead of using a
> logging driver plugin, to use a streaming sidecar container as shown in [3]
> and have the controller interact with the sidecar to update the current
> activation id (or have the sidecar read it from a shared memory location
> that is updated by /run to minimize the differences between deployment
> platforms).  I'm not sure this really works as well, since the sidecar
> might fall behind in processing the logs, so we might still need a
> handshake somewhere.
> 
> A third option would be to extend our current sentineled log design by also
> writing a "START_WHISK_ACTIVATION_LOG <ACTIVATION_ID>" line in the /run
> method before invoking the user code.  We'd still have to post-process the
> log files, but it could be decoupled from the critical path since the
> post-processor would have the activation id available to it in the log
> files (and thus would not need to handshake with the controller at all,
> thus we could offload all logging to a node-level log processing/forwarding
> agent).
> 
> Option 3 would be really easy to implement and is independent of the
> details of the low-level log driver, but doesn't eliminate the need to
> post-process the logs. It just makes it easier to move that processing off
> any critical path.
> 
> Thoughts?
> 
> --dave
> 
> [1] 
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FOPENWHISK%2FOpenWhisk%2Bfuture&amp;data=02%7C01%7Ctnorris%40adobe.com%7C7c32fc01b3c44cfb7c3d08d5ee58a341%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636676986817234391&amp;sdata=3CAF0psQZ%2BTQCOuhSycTUaTYfcTgh3eIPlw1IDYvLWU%3D&amp;reserved=0
> +architecture
> [2] 
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.docker.com%2Fv17.09%2Fengine%2Fadmin%2Flogging%2Fplugins%2F&amp;data=02%7C01%7Ctnorris%40adobe.com%7C7c32fc01b3c44cfb7c3d08d5ee58a341%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636676986817234391&amp;sdata=%2FZjzrOh65Bak8iDMvk8FgftPDjwdJCEKvYzTPOy0o3U%3D&amp;reserved=0
> [3] 
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fkubernetes.io%2Fdocs%2Fconcepts%2Fcluster-administration%2Flogging%2F&amp;data=02%7C01%7Ctnorris%40adobe.com%7C7c32fc01b3c44cfb7c3d08d5ee58a341%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636676986817234391&amp;sdata=DaYQhz70aF0TVgGpSqzhOZZQC8XNT0znAMMDbUkHlSo%3D&amp;reserved=0

Reply via email to