ppkarwasz commented on code in PR #2715: URL: https://github.com/apache/logging-log4j2/pull/2715#discussion_r1667084479
########## src/site/antora/modules/ROOT/pages/manual/cloud.adoc: ########## @@ -15,549 +15,267 @@ limitations under the License. //// -= Using Log4j in Cloud Enabled Applications - -== The Twelve-Factor Application - -The Logging Guidelines for https://12factor.net/logs[The Twelve-Factor App] state that all logs should be routed -unbuffered to stdout. Since this is the least common denominator it is guaranteed to work for all applications. However, -as with any set of general guidelines, choosing the least common denominator approach comes at a cost. Some of the costs -in Java applications include: - -. Java stack traces are multi-line log messages. The standard docker log driver cannot handle these properly. See -https://github.com/moby/moby/issues/22920[Docker Issue #22920] which was closed with the message "Don't Care". -Solutions for this are to: -.. Use a docker log driver that does support multi-line log message, -.. Use a logging format that does not produce multi-line messages, -.. Log from Log4j directly to a logging forwarder or aggregator and bypass the docker logging driver. -. When logging to stdout in Docker, log events pass through Java's standard output handling which is then directed -to the operating system so that the output can be piped into a file. The overhead of all this is measurably slower -than just writing directly to a file as can be seen in these benchmark results where logging -to stdout is 16-20 times slower over repeated runs than logging directly to the file. The results below were obtained by -running the https://github.com/apache/logging-log4j2/blob/release-2.x/log4j-perf-test/src/main/java/org/apache/logging/log4j/perf/jmh/OutputBenchmark.java[Output Benchmark] -on a 2018 MacBook Pro with a 2.9GHz Intel Core i9 processor and a 1TB SSD. However, these results alone would not be -enough to argue against writing to the standard output stream as they only amount to about 14-25 microseconds -per logging call vs 1.5 microseconds when writing to the file. -+ -[source] ----- - Benchmark Mode Cnt Score Error Units - OutputBenchmark.console thrpt 20 39291.885 ± 3370.066 ops/s - OutputBenchmark.file thrpt 20 654584.309 ± 59399.092 ops/s - OutputBenchmark.redirect thrpt 20 70284.576 ± 7452.167 ops/s ----- -. When performing audit logging using a framework such as log4j-audit guaranteed delivery of the audit events -is required. Many of the options for writing the output, including writing to the standard output stream, do -not guarantee delivery. In these cases the event must be delivered to a "forwarder" that acknowledges receipt -only when it has placed the event in durable storage, such as what https://flume.apache.org/[Apache Flume] will do. += Integrating with service-oriented architectures -== Logging Approaches +In this page we will share certain <<best-practices,best practices>> you can employ in your applications using Log4j Core to integrate them with service-oriented architectures. +While doing so, we will also try to share guides on some popular scenarios. -All the solutions discussed on this page are predicated with the idea that log files cannot permanently -reside on the file system and that all log events should be routed to one or more log analysis tools that will -be used for reporting and alerting. There are many ways to forward and collect events to be sent to the -log analysis tools. +[#motivation] +== Motivation -Note that any approach that bypasses Docker's logging drivers requires Log4j's -https://logging.apache.org/log4j/2.x/manual/lookups.html#DockerLookup[Docker Lookup] to allow Docker attributes to be injected into the log events. +Most modern software is deployed in https://en.wikipedia.org/wiki/Service-oriented_architecture[service-oriented architectures]. +This is a very broad domain and can be realized in an amazingly large number of ways. +Nevertheless, they all redefine the notion of an application: -=== Logging to the Standard Output Stream +* Deployed in *multiple instances* +* Situated in *multiple locations*; either in the same rack, or in different data centers located in different continents +* Hosted by *multiple platforms*; hardware, virtual machine, container, etc. +* *Polyglot*; a product of multiple programming languages +* *Scaled* on demand; instances come and go in time -As discussed above, this is the recommended 12-Factor approach for applications running in a docker container. -The Log4j team does not recommend this approach for performance reasons. +Naturally, logging systems also evolved to accommodate these needs. +In particular, the old practice of _"monoliths writing logs to files rotated daily"_ has changed in two major angles: -image:DockerStdout.png[Stdout, "Application Logging to the Standard Output Stream"] +Application delivers logs differently:: -=== Logging to the Standard Output Stream with the Docker Fluentd Logging Driver +Applications no longer write logs to files, but <<structured-encoding,encode them structurally>>, and deliver them to an external system centrally managed. +Most of the time this is a <<proxy,proxy>> (a library, a sidecar container, etc.) that takes care of discovering the log storage system and determining the right external service to forward the logs to. -Docker provides alternate https://docs.docker.com/config/containers/logging/configure/[logging drivers], -such as https://docs.docker.com/config/containers/logging/fluentd/[fluentd], that -can be used to redirect the standard output stream to a log forwarder or log aggregator. +Platform stores logs differently:: -When routing to a log forwarder it is expected that the forwarder will have the same lifetime as the -application. If the forwarder should fail the management tools would be expected to also terminate -other containers dependent on the forwarder. +There is no longer `/var/log/tomcat/app.log` combining all logs of a monolith. Review Comment: _Nit_: Most Linux distributions use `/var/log/tomcat/catalina.out` for `stdout/stderr`. ########## src/site/antora/modules/ROOT/pages/manual/cloud.adoc: ########## @@ -15,549 +15,267 @@ limitations under the License. //// -= Using Log4j in Cloud Enabled Applications - -== The Twelve-Factor Application - -The Logging Guidelines for https://12factor.net/logs[The Twelve-Factor App] state that all logs should be routed -unbuffered to stdout. Since this is the least common denominator it is guaranteed to work for all applications. However, -as with any set of general guidelines, choosing the least common denominator approach comes at a cost. Some of the costs -in Java applications include: - -. Java stack traces are multi-line log messages. The standard docker log driver cannot handle these properly. See -https://github.com/moby/moby/issues/22920[Docker Issue #22920] which was closed with the message "Don't Care". -Solutions for this are to: -.. Use a docker log driver that does support multi-line log message, -.. Use a logging format that does not produce multi-line messages, -.. Log from Log4j directly to a logging forwarder or aggregator and bypass the docker logging driver. -. When logging to stdout in Docker, log events pass through Java's standard output handling which is then directed -to the operating system so that the output can be piped into a file. The overhead of all this is measurably slower -than just writing directly to a file as can be seen in these benchmark results where logging -to stdout is 16-20 times slower over repeated runs than logging directly to the file. The results below were obtained by -running the https://github.com/apache/logging-log4j2/blob/release-2.x/log4j-perf-test/src/main/java/org/apache/logging/log4j/perf/jmh/OutputBenchmark.java[Output Benchmark] -on a 2018 MacBook Pro with a 2.9GHz Intel Core i9 processor and a 1TB SSD. However, these results alone would not be -enough to argue against writing to the standard output stream as they only amount to about 14-25 microseconds -per logging call vs 1.5 microseconds when writing to the file. -+ -[source] ----- - Benchmark Mode Cnt Score Error Units - OutputBenchmark.console thrpt 20 39291.885 ± 3370.066 ops/s - OutputBenchmark.file thrpt 20 654584.309 ± 59399.092 ops/s - OutputBenchmark.redirect thrpt 20 70284.576 ± 7452.167 ops/s ----- -. When performing audit logging using a framework such as log4j-audit guaranteed delivery of the audit events -is required. Many of the options for writing the output, including writing to the standard output stream, do -not guarantee delivery. In these cases the event must be delivered to a "forwarder" that acknowledges receipt -only when it has placed the event in durable storage, such as what https://flume.apache.org/[Apache Flume] will do. += Integrating with service-oriented architectures -== Logging Approaches +In this page we will share certain <<best-practices,best practices>> you can employ in your applications using Log4j Core to integrate them with service-oriented architectures. +While doing so, we will also try to share guides on some popular scenarios. -All the solutions discussed on this page are predicated with the idea that log files cannot permanently -reside on the file system and that all log events should be routed to one or more log analysis tools that will -be used for reporting and alerting. There are many ways to forward and collect events to be sent to the -log analysis tools. +[#motivation] +== Motivation -Note that any approach that bypasses Docker's logging drivers requires Log4j's -https://logging.apache.org/log4j/2.x/manual/lookups.html#DockerLookup[Docker Lookup] to allow Docker attributes to be injected into the log events. +Most modern software is deployed in https://en.wikipedia.org/wiki/Service-oriented_architecture[service-oriented architectures]. +This is a very broad domain and can be realized in an amazingly large number of ways. +Nevertheless, they all redefine the notion of an application: -=== Logging to the Standard Output Stream +* Deployed in *multiple instances* +* Situated in *multiple locations*; either in the same rack, or in different data centers located in different continents +* Hosted by *multiple platforms*; hardware, virtual machine, container, etc. +* *Polyglot*; a product of multiple programming languages +* *Scaled* on demand; instances come and go in time -As discussed above, this is the recommended 12-Factor approach for applications running in a docker container. -The Log4j team does not recommend this approach for performance reasons. +Naturally, logging systems also evolved to accommodate these needs. +In particular, the old practice of _"monoliths writing logs to files rotated daily"_ has changed in two major angles: -image:DockerStdout.png[Stdout, "Application Logging to the Standard Output Stream"] +Application delivers logs differently:: -=== Logging to the Standard Output Stream with the Docker Fluentd Logging Driver +Applications no longer write logs to files, but <<structured-encoding,encode them structurally>>, and deliver them to an external system centrally managed. +Most of the time this is a <<proxy,proxy>> (a library, a sidecar container, etc.) that takes care of discovering the log storage system and determining the right external service to forward the logs to. -Docker provides alternate https://docs.docker.com/config/containers/logging/configure/[logging drivers], -such as https://docs.docker.com/config/containers/logging/fluentd/[fluentd], that -can be used to redirect the standard output stream to a log forwarder or log aggregator. +Platform stores logs differently:: -When routing to a log forwarder it is expected that the forwarder will have the same lifetime as the -application. If the forwarder should fail the management tools would be expected to also terminate -other containers dependent on the forwarder. +There is no longer `/var/log/tomcat/app.log` combining all logs of a monolith. +Instead, the software runs in multiple instances, each is implemented in a different language, and instances get scaled (i.e., new ones get started, old ones get stopped) on demand. +To accommodate this, logs are persisted on a central storage system (Elasticsearch, Google Cloud Logging, etc.) that allows advanced navigation and filtering capabilities. -image:DockerFluentd.png[Docker Fluentbit, "Logging via StdOut using the Docker Fluentd Logging Driver to Fluent-bit"] +Log4j Core not only adapts to this evolution, but also strives to provide the best in the class support for that. +We will explore how to integrate Log4j with service-oriented architectures. -As an alternative the logging drivers could be configured to route events directly to a logging aggregator. -This is generally not a good idea as the logging drivers only allow a single host and port to be configured. -The docker documentation isn't clear but infers that log events will be dropped when log events cannot be -delivered so this method should not be used if a highly available solution is required. +[#best-practices] +== Best practices -image:DockerFluentdAggregator.png[Docker Fluentd, "Logging via StdOut using the Docker Fluentd Logging Driver to Fluentd"] +Independent of the service-oriented architecture you choose, there are certain best practices we strongly encourage you to follow: -=== Logging to a File +[#structured-encoding] +=== Encode logs using a structured layout -While this is not the recommended 12-Factor approach, it performs very well. -However, it requires that the application declares a volume where the log files will reside and then configures the log forwarder to tail those files. -Care must also be taken to automatically manage the disk space used for the logs, which Log4j can perform via the "Delete" action on the xref:manual/appenders.adoc#RollingFileAppender[RollingFileAppender]. +We can't emphasize it enough to not use anything, but a xref:manual/layouts.adoc#structured-logging[structured layout] to deliver your logs to an external system. +We recommend xref:manual/json-template-layout.adoc[] for this purpose: -image:DockerLogFile.png[File, "Logging to a File"] +* JSON Template Layout provides full customizability and contains several predefined layouts for popular log storage services. +* JSON is accepted by every log storage service. +* JSON is supported by logging frameworks in other languages. +This makes it possible to agree on a common log format with non-Java applications. -=== Sending Directly to a Log Forwarder via TCP +[#proxy] +=== Use a proxy for writing logs -Sending logs directly to a Log Forwarder is simple as it generally just requires that the forwarder's host and port be configured on a SocketAppender with an appropriate layout. +Most of the time it is not a good idea to write to the log storage system directly, but instead delegate that task to a proxy. +This design decouples applications' log target and the log storage system and, as a result, effectively enables each to evolve independently and reliably (i.e., without downtime). +For instance, this will allow the log storage system to scale or migrate to a new environment while proxies take care of necessary buffering and routing. -image:DockerTCP.png[TCP, "Application Logging to a Forwarder via TCP"] +This proxy can appear in many forms, for instance: -=== Sending Directly to a Log Aggregator via TCP +* *Console* can act as a proxy. +Logs written to console can be consumed by an external service. +For example, https://12factor.net/logs[The Twelve-Factor App] and https://kubernetes.io/docs/concepts/cluster-administration/logging/[Kubernetes Logging Architecture] recommends this approach. -Similar to sending logs to a forwarder, logs can also be sent to a cluster of aggregators. -However, setting this up is not as simple since, to be highly available, a cluster of aggregators must be used. -However, the SocketAppender currently can only be configured with a single host and port. -To allow for failover if the primary aggregator fails the SocketAppender must be enclosed in a xref:manual/appenders.adoc#FailoverAppender[FailoverAppender], which would also have the secondary aggregator configured. -Another option is to have the SocketAppender point to a highly available proxy that can forward to the Log Aggregator. +* A *library* can act as proxy. +It can tap into the logging API and forward it to an external service. +For instance, https://docs.datadoghq.com/logs/log_collection/java[Datadog's Java Log Collector] uses this mechanism. -If the log aggregator used is Apache Flume (or similar) the Appenders for these support -being configured with a list of hosts and ports so high availability is not an issue. +* An external *service* can act as a proxy, which applications can write logs to. +For example, you can write to https://www.elastic.co/logstash[Logstash], a https://kubernetes.io/docs/concepts/cluster-administration/logging/#sidecar-container-with-a-logging-agent[Kubernetes logging agent sidecar], or a https://redis.io/glossary/redis-queue/[Redis queue] over a socket. -image:LoggerAggregator.png[Aggregator, "Application Logging to an Aggregator via TCP"] +What to use as a proxy depends on your deployment environment. +You should consult to your colleagues if there is already an established logging proxy convention. +Otherwise, we strongly encourage you to establish one in collaboration with your system administrators and architects. -[#ELK] -== Logging using Elasticsearch, Logstash, and Kibana +[#appender] +=== Configure your appender correctly -There are various approaches with different trade-offs for ingesting logs into -an ELK stack. Here we will briefly cover how one can forward Log4j generated -events first to Logstash and then to Elasticsearch. +Once you decide on <<proxy,the log proxy>> to use, the choice of appender pretty much becomes self-evident. +Nevertheless, there are some tips we recommend you to practice: -=== Log4j Configuration +* *For writing to console*, use a xref:manual/appenders.adoc#ConsoleAppender[Console Appender] and make sure to configure its `direct` attribute to `true` for the maximum efficiency. -==== JsonTemplateLayout +* *For writing to an external service*, use a xref:manual/appenders.adoc#SocketAppender[Socket Appender] and make sure to configure the null delimiter of the associated layout. +For instance, see xref:manual/json-template-layout.adoc#plugin-attr-nullEventDelimiterEnabled[the `nullEventDelimiterEnabled` configuration attribute of JSON Template Layout]. -Log4j provides a multitude of JSON generating layouts. In particular, JSON -Template Layout allows full schema -customization and bundles ELK-specific layouts by default, which makes it a -great fit for the bill. Using the EcsLayout template as shown below will generate data in Kibana where -the message displayed exactly matches the message passed to Log4j and most of the event attributes, including -any exceptions, are present as individual attributes that can be displayed. Note, however that stack traces -will be formatted without newlines. +[#file] +=== Avoid writing to files -[source,xml] ----- -<Socket name="Logstash" - host="${sys:logstash.host}" - port="12345" - protocol="tcp" - bufferedIo="true"> - <JsonTemplateLayout eventTemplateUri="classpath:EcsLayout.json"> - <EventTemplateAdditionalField key="containerId" value="${docker:containerId:-}"/> - <EventTemplateAdditionalField key="application" value="${lower:${spring:spring.application.name:-spring}}"/> - <EventTemplateAdditionalField key="kubernetes.serviceAccountName" value="${k8s:accountName:-}"/> - <EventTemplateAdditionalField key="kubernetes.containerId" value="${k8s:containerId:-}"/> - <EventTemplateAdditionalField key="kubernetes.containerName" value="${k8s:containerName:-}"/> - <EventTemplateAdditionalField key="kubernetes.host" value="${k8s:host:-}"/> - <EventTemplateAdditionalField key="kubernetes.labels.app" value="${k8s:labels.app:-}"/> - <EventTemplateAdditionalField key="kubernetes.labels.pod-template-hash" value="${k8s:labels.podTemplateHash:-}"/> - <EventTemplateAdditionalField key="kubernetes.master_url" value="${k8s:masterUrl:-}"/> - <EventTemplateAdditionalField key="kubernetes.namespaceId" value="${k8s:namespaceId:-}"/> - <EventTemplateAdditionalField key="kubernetes.namespaceName" value="${k8s:namespaceName:-}"/> - <EventTemplateAdditionalField key="kubernetes.podID" value="${k8s:podId:-}"/> - <EventTemplateAdditionalField key="kubernetes.podIP" value="${k8s:podIp:-}"/> - <EventTemplateAdditionalField key="kubernetes.podName" value="${k8s:podName:-}"/> - <EventTemplateAdditionalField key="kubernetes.imageId" value="${k8s:imageId:-}"/> - <EventTemplateAdditionalField key="kubernetes.imageName" value="${k8s:imageName:-}"/> - </JsonTemplateLayout> -</Socket> ----- +As explained in <<motivation>>, in a service-oriented architecture, log files are -==== Gelf Template +* Difficult to maintain – writable volumes must be mounted to the runtime (container, VM, etc.), rotated, and monitored for excessive usage +* Difficult to use – multiple files need to be manually combined while troubleshooting, no central navigation point +* Difficult to interoperate – each application needs to be individually configured to produce the same structured log output to enable interleaving of logs from multiple sources while troubleshooting distributed issues -The JsonTemplateLayout can also be used to generate JSON that matches the GELF specification which can format -the message attribute using a pattern in accordance with the PatternLayout. For example, the following -template, named EnhancedGelf.json, can be used to generate GELF-compliant data that can be passed to Logstash. -With this template the message attribute will include the thread id, level, specific ThreadContext attributes, -the class name, method name, and line number as well as the message. If an exception is included it will also -be included with newlines. This format follows very closely what you would see in a typical log file on disk -using the PatternLayout but has the additional advantage of including the attributes as separate fields that -can be queried. +In short, *we don't recommend writing logs to files*. -[source,json] ----- -{ - "version": "1.1", - "host": "${hostName}", - "short_message": { - "$resolver": "message", - "stringified": true - }, - "full_message": { - "$resolver": "message", - "pattern": "[%t] %-5p %X{requestId, sessionId, loginId, userId, ipAddress, corpAcctNumber} %C{1.}.%M:%L - %m", - "stringified": true - }, - "timestamp": { - "$resolver": "timestamp", - "epoch": { - "unit": "secs" - } - }, - "level": { - "$resolver": "level", - "field": "severity", - "severity": { - "field": "code" - } - }, - "_logger": { - "$resolver": "logger", - "field": "name" - }, - "_thread": { - "$resolver": "thread", - "field": "name" - }, - "_mdc": { - "$resolver": "mdc", - "flatten": { - "prefix": "_" - }, - "stringified": true - } -} ----- +A common excuse for writing logs to files is to use them as a buffer prior to ingesting them either to the logging system proxy or to the log storage system. +Even in this case, most of the time it is sign of bad architecture design, and costs extra resources and efficiency degradation. -The logging configuration to use this template would be +[#separate-config] +=== Separate logging configuration from the application -[source,xml] +We strongly advise you to separate the logging configuration from the application and couple them in an environment-specific way. +This will allow you to + +* Address environment-specific configurations (e.g., logging verbosity needs of test and production can be different) +* Ensure Log4j configuration changes applies to all affected Log4j-using software without the need to manually update their Log4j configuration one by one + +How to implement this separation pretty much depends on your setup. +We will share some recommended approaches to give you an idea: + +Ship configuration files along with the application, and choose a subset of them during deployment:: Review Comment: There is literally half a dozen ways to deploy the configuration. 1. Providing `log4j2.configurationFile` that points to a centralized configuration store, 2. Merging different configuration files (why not merge a local "fail-over" config with a remote one?), 3. Use arbiters to select the logging profile, 4. Use property sources to configure the logging level, 5. Mounting an external volume on the container, 6. Custom configurations. I think we should point the user to the appropriate chapter and let them choose. ########## src/site/antora/modules/ROOT/pages/manual/cloud.adoc: ########## @@ -15,549 +15,267 @@ limitations under the License. //// -= Using Log4j in Cloud Enabled Applications - -== The Twelve-Factor Application - -The Logging Guidelines for https://12factor.net/logs[The Twelve-Factor App] state that all logs should be routed -unbuffered to stdout. Since this is the least common denominator it is guaranteed to work for all applications. However, -as with any set of general guidelines, choosing the least common denominator approach comes at a cost. Some of the costs -in Java applications include: - -. Java stack traces are multi-line log messages. The standard docker log driver cannot handle these properly. See -https://github.com/moby/moby/issues/22920[Docker Issue #22920] which was closed with the message "Don't Care". -Solutions for this are to: -.. Use a docker log driver that does support multi-line log message, -.. Use a logging format that does not produce multi-line messages, -.. Log from Log4j directly to a logging forwarder or aggregator and bypass the docker logging driver. -. When logging to stdout in Docker, log events pass through Java's standard output handling which is then directed -to the operating system so that the output can be piped into a file. The overhead of all this is measurably slower -than just writing directly to a file as can be seen in these benchmark results where logging -to stdout is 16-20 times slower over repeated runs than logging directly to the file. The results below were obtained by -running the https://github.com/apache/logging-log4j2/blob/release-2.x/log4j-perf-test/src/main/java/org/apache/logging/log4j/perf/jmh/OutputBenchmark.java[Output Benchmark] -on a 2018 MacBook Pro with a 2.9GHz Intel Core i9 processor and a 1TB SSD. However, these results alone would not be -enough to argue against writing to the standard output stream as they only amount to about 14-25 microseconds -per logging call vs 1.5 microseconds when writing to the file. -+ -[source] ----- - Benchmark Mode Cnt Score Error Units - OutputBenchmark.console thrpt 20 39291.885 ± 3370.066 ops/s - OutputBenchmark.file thrpt 20 654584.309 ± 59399.092 ops/s - OutputBenchmark.redirect thrpt 20 70284.576 ± 7452.167 ops/s ----- -. When performing audit logging using a framework such as log4j-audit guaranteed delivery of the audit events -is required. Many of the options for writing the output, including writing to the standard output stream, do -not guarantee delivery. In these cases the event must be delivered to a "forwarder" that acknowledges receipt -only when it has placed the event in durable storage, such as what https://flume.apache.org/[Apache Flume] will do. += Integrating with service-oriented architectures -== Logging Approaches +In this page we will share certain <<best-practices,best practices>> you can employ in your applications using Log4j Core to integrate them with service-oriented architectures. +While doing so, we will also try to share guides on some popular scenarios. -All the solutions discussed on this page are predicated with the idea that log files cannot permanently -reside on the file system and that all log events should be routed to one or more log analysis tools that will -be used for reporting and alerting. There are many ways to forward and collect events to be sent to the -log analysis tools. +[#motivation] +== Motivation -Note that any approach that bypasses Docker's logging drivers requires Log4j's -https://logging.apache.org/log4j/2.x/manual/lookups.html#DockerLookup[Docker Lookup] to allow Docker attributes to be injected into the log events. +Most modern software is deployed in https://en.wikipedia.org/wiki/Service-oriented_architecture[service-oriented architectures]. +This is a very broad domain and can be realized in an amazingly large number of ways. +Nevertheless, they all redefine the notion of an application: -=== Logging to the Standard Output Stream +* Deployed in *multiple instances* +* Situated in *multiple locations*; either in the same rack, or in different data centers located in different continents +* Hosted by *multiple platforms*; hardware, virtual machine, container, etc. +* *Polyglot*; a product of multiple programming languages +* *Scaled* on demand; instances come and go in time -As discussed above, this is the recommended 12-Factor approach for applications running in a docker container. -The Log4j team does not recommend this approach for performance reasons. +Naturally, logging systems also evolved to accommodate these needs. +In particular, the old practice of _"monoliths writing logs to files rotated daily"_ has changed in two major angles: -image:DockerStdout.png[Stdout, "Application Logging to the Standard Output Stream"] +Application delivers logs differently:: -=== Logging to the Standard Output Stream with the Docker Fluentd Logging Driver +Applications no longer write logs to files, but <<structured-encoding,encode them structurally>>, and deliver them to an external system centrally managed. +Most of the time this is a <<proxy,proxy>> (a library, a sidecar container, etc.) that takes care of discovering the log storage system and determining the right external service to forward the logs to. -Docker provides alternate https://docs.docker.com/config/containers/logging/configure/[logging drivers], -such as https://docs.docker.com/config/containers/logging/fluentd/[fluentd], that -can be used to redirect the standard output stream to a log forwarder or log aggregator. +Platform stores logs differently:: -When routing to a log forwarder it is expected that the forwarder will have the same lifetime as the -application. If the forwarder should fail the management tools would be expected to also terminate -other containers dependent on the forwarder. +There is no longer `/var/log/tomcat/app.log` combining all logs of a monolith. +Instead, the software runs in multiple instances, each is implemented in a different language, and instances get scaled (i.e., new ones get started, old ones get stopped) on demand. +To accommodate this, logs are persisted on a central storage system (Elasticsearch, Google Cloud Logging, etc.) that allows advanced navigation and filtering capabilities. -image:DockerFluentd.png[Docker Fluentbit, "Logging via StdOut using the Docker Fluentd Logging Driver to Fluent-bit"] +Log4j Core not only adapts to this evolution, but also strives to provide the best in the class support for that. +We will explore how to integrate Log4j with service-oriented architectures. -As an alternative the logging drivers could be configured to route events directly to a logging aggregator. -This is generally not a good idea as the logging drivers only allow a single host and port to be configured. -The docker documentation isn't clear but infers that log events will be dropped when log events cannot be -delivered so this method should not be used if a highly available solution is required. +[#best-practices] +== Best practices -image:DockerFluentdAggregator.png[Docker Fluentd, "Logging via StdOut using the Docker Fluentd Logging Driver to Fluentd"] +Independent of the service-oriented architecture you choose, there are certain best practices we strongly encourage you to follow: -=== Logging to a File +[#structured-encoding] +=== Encode logs using a structured layout -While this is not the recommended 12-Factor approach, it performs very well. -However, it requires that the application declares a volume where the log files will reside and then configures the log forwarder to tail those files. -Care must also be taken to automatically manage the disk space used for the logs, which Log4j can perform via the "Delete" action on the xref:manual/appenders.adoc#RollingFileAppender[RollingFileAppender]. +We can't emphasize it enough to not use anything, but a xref:manual/layouts.adoc#structured-logging[structured layout] to deliver your logs to an external system. +We recommend xref:manual/json-template-layout.adoc[] for this purpose: -image:DockerLogFile.png[File, "Logging to a File"] +* JSON Template Layout provides full customizability and contains several predefined layouts for popular log storage services. +* JSON is accepted by every log storage service. +* JSON is supported by logging frameworks in other languages. +This makes it possible to agree on a common log format with non-Java applications. -=== Sending Directly to a Log Forwarder via TCP +[#proxy] +=== Use a proxy for writing logs -Sending logs directly to a Log Forwarder is simple as it generally just requires that the forwarder's host and port be configured on a SocketAppender with an appropriate layout. +Most of the time it is not a good idea to write to the log storage system directly, but instead delegate that task to a proxy. +This design decouples applications' log target and the log storage system and, as a result, effectively enables each to evolve independently and reliably (i.e., without downtime). +For instance, this will allow the log storage system to scale or migrate to a new environment while proxies take care of necessary buffering and routing. -image:DockerTCP.png[TCP, "Application Logging to a Forwarder via TCP"] +This proxy can appear in many forms, for instance: -=== Sending Directly to a Log Aggregator via TCP +* *Console* can act as a proxy. +Logs written to console can be consumed by an external service. +For example, https://12factor.net/logs[The Twelve-Factor App] and https://kubernetes.io/docs/concepts/cluster-administration/logging/[Kubernetes Logging Architecture] recommends this approach. -Similar to sending logs to a forwarder, logs can also be sent to a cluster of aggregators. -However, setting this up is not as simple since, to be highly available, a cluster of aggregators must be used. -However, the SocketAppender currently can only be configured with a single host and port. -To allow for failover if the primary aggregator fails the SocketAppender must be enclosed in a xref:manual/appenders.adoc#FailoverAppender[FailoverAppender], which would also have the secondary aggregator configured. -Another option is to have the SocketAppender point to a highly available proxy that can forward to the Log Aggregator. +* A *library* can act as proxy. +It can tap into the logging API and forward it to an external service. +For instance, https://docs.datadoghq.com/logs/log_collection/java[Datadog's Java Log Collector] uses this mechanism. -If the log aggregator used is Apache Flume (or similar) the Appenders for these support -being configured with a list of hosts and ports so high availability is not an issue. +* An external *service* can act as a proxy, which applications can write logs to. +For example, you can write to https://www.elastic.co/logstash[Logstash], a https://kubernetes.io/docs/concepts/cluster-administration/logging/#sidecar-container-with-a-logging-agent[Kubernetes logging agent sidecar], or a https://redis.io/glossary/redis-queue/[Redis queue] over a socket. -image:LoggerAggregator.png[Aggregator, "Application Logging to an Aggregator via TCP"] +What to use as a proxy depends on your deployment environment. +You should consult to your colleagues if there is already an established logging proxy convention. +Otherwise, we strongly encourage you to establish one in collaboration with your system administrators and architects. -[#ELK] -== Logging using Elasticsearch, Logstash, and Kibana +[#appender] +=== Configure your appender correctly -There are various approaches with different trade-offs for ingesting logs into -an ELK stack. Here we will briefly cover how one can forward Log4j generated -events first to Logstash and then to Elasticsearch. +Once you decide on <<proxy,the log proxy>> to use, the choice of appender pretty much becomes self-evident. +Nevertheless, there are some tips we recommend you to practice: -=== Log4j Configuration +* *For writing to console*, use a xref:manual/appenders.adoc#ConsoleAppender[Console Appender] and make sure to configure its `direct` attribute to `true` for the maximum efficiency. -==== JsonTemplateLayout +* *For writing to an external service*, use a xref:manual/appenders.adoc#SocketAppender[Socket Appender] and make sure to configure the null delimiter of the associated layout. Review Comment: Should we mention to users to prefer `TCP` over `UDP`? At least if `SocketAppender` is concerned. We don't have an appender that uses `QUIC`. ########## src/site/antora/modules/ROOT/pages/manual/cloud.adoc: ########## @@ -15,549 +15,267 @@ limitations under the License. //// -= Using Log4j in Cloud Enabled Applications - -== The Twelve-Factor Application - -The Logging Guidelines for https://12factor.net/logs[The Twelve-Factor App] state that all logs should be routed -unbuffered to stdout. Since this is the least common denominator it is guaranteed to work for all applications. However, -as with any set of general guidelines, choosing the least common denominator approach comes at a cost. Some of the costs -in Java applications include: - -. Java stack traces are multi-line log messages. The standard docker log driver cannot handle these properly. See -https://github.com/moby/moby/issues/22920[Docker Issue #22920] which was closed with the message "Don't Care". -Solutions for this are to: -.. Use a docker log driver that does support multi-line log message, -.. Use a logging format that does not produce multi-line messages, -.. Log from Log4j directly to a logging forwarder or aggregator and bypass the docker logging driver. -. When logging to stdout in Docker, log events pass through Java's standard output handling which is then directed -to the operating system so that the output can be piped into a file. The overhead of all this is measurably slower -than just writing directly to a file as can be seen in these benchmark results where logging -to stdout is 16-20 times slower over repeated runs than logging directly to the file. The results below were obtained by -running the https://github.com/apache/logging-log4j2/blob/release-2.x/log4j-perf-test/src/main/java/org/apache/logging/log4j/perf/jmh/OutputBenchmark.java[Output Benchmark] -on a 2018 MacBook Pro with a 2.9GHz Intel Core i9 processor and a 1TB SSD. However, these results alone would not be -enough to argue against writing to the standard output stream as they only amount to about 14-25 microseconds -per logging call vs 1.5 microseconds when writing to the file. -+ -[source] ----- - Benchmark Mode Cnt Score Error Units - OutputBenchmark.console thrpt 20 39291.885 ± 3370.066 ops/s - OutputBenchmark.file thrpt 20 654584.309 ± 59399.092 ops/s - OutputBenchmark.redirect thrpt 20 70284.576 ± 7452.167 ops/s ----- -. When performing audit logging using a framework such as log4j-audit guaranteed delivery of the audit events -is required. Many of the options for writing the output, including writing to the standard output stream, do -not guarantee delivery. In these cases the event must be delivered to a "forwarder" that acknowledges receipt -only when it has placed the event in durable storage, such as what https://flume.apache.org/[Apache Flume] will do. += Integrating with service-oriented architectures -== Logging Approaches +In this page we will share certain <<best-practices,best practices>> you can employ in your applications using Log4j Core to integrate them with service-oriented architectures. +While doing so, we will also try to share guides on some popular scenarios. -All the solutions discussed on this page are predicated with the idea that log files cannot permanently -reside on the file system and that all log events should be routed to one or more log analysis tools that will -be used for reporting and alerting. There are many ways to forward and collect events to be sent to the -log analysis tools. +[#motivation] +== Motivation -Note that any approach that bypasses Docker's logging drivers requires Log4j's -https://logging.apache.org/log4j/2.x/manual/lookups.html#DockerLookup[Docker Lookup] to allow Docker attributes to be injected into the log events. +Most modern software is deployed in https://en.wikipedia.org/wiki/Service-oriented_architecture[service-oriented architectures]. +This is a very broad domain and can be realized in an amazingly large number of ways. +Nevertheless, they all redefine the notion of an application: -=== Logging to the Standard Output Stream +* Deployed in *multiple instances* +* Situated in *multiple locations*; either in the same rack, or in different data centers located in different continents +* Hosted by *multiple platforms*; hardware, virtual machine, container, etc. +* *Polyglot*; a product of multiple programming languages +* *Scaled* on demand; instances come and go in time -As discussed above, this is the recommended 12-Factor approach for applications running in a docker container. -The Log4j team does not recommend this approach for performance reasons. +Naturally, logging systems also evolved to accommodate these needs. +In particular, the old practice of _"monoliths writing logs to files rotated daily"_ has changed in two major angles: -image:DockerStdout.png[Stdout, "Application Logging to the Standard Output Stream"] +Application delivers logs differently:: -=== Logging to the Standard Output Stream with the Docker Fluentd Logging Driver +Applications no longer write logs to files, but <<structured-encoding,encode them structurally>>, and deliver them to an external system centrally managed. +Most of the time this is a <<proxy,proxy>> (a library, a sidecar container, etc.) that takes care of discovering the log storage system and determining the right external service to forward the logs to. -Docker provides alternate https://docs.docker.com/config/containers/logging/configure/[logging drivers], -such as https://docs.docker.com/config/containers/logging/fluentd/[fluentd], that -can be used to redirect the standard output stream to a log forwarder or log aggregator. +Platform stores logs differently:: -When routing to a log forwarder it is expected that the forwarder will have the same lifetime as the -application. If the forwarder should fail the management tools would be expected to also terminate -other containers dependent on the forwarder. +There is no longer `/var/log/tomcat/app.log` combining all logs of a monolith. +Instead, the software runs in multiple instances, each is implemented in a different language, and instances get scaled (i.e., new ones get started, old ones get stopped) on demand. +To accommodate this, logs are persisted on a central storage system (Elasticsearch, Google Cloud Logging, etc.) that allows advanced navigation and filtering capabilities. -image:DockerFluentd.png[Docker Fluentbit, "Logging via StdOut using the Docker Fluentd Logging Driver to Fluent-bit"] +Log4j Core not only adapts to this evolution, but also strives to provide the best in the class support for that. +We will explore how to integrate Log4j with service-oriented architectures. -As an alternative the logging drivers could be configured to route events directly to a logging aggregator. -This is generally not a good idea as the logging drivers only allow a single host and port to be configured. -The docker documentation isn't clear but infers that log events will be dropped when log events cannot be -delivered so this method should not be used if a highly available solution is required. +[#best-practices] +== Best practices -image:DockerFluentdAggregator.png[Docker Fluentd, "Logging via StdOut using the Docker Fluentd Logging Driver to Fluentd"] +Independent of the service-oriented architecture you choose, there are certain best practices we strongly encourage you to follow: -=== Logging to a File +[#structured-encoding] +=== Encode logs using a structured layout -While this is not the recommended 12-Factor approach, it performs very well. -However, it requires that the application declares a volume where the log files will reside and then configures the log forwarder to tail those files. -Care must also be taken to automatically manage the disk space used for the logs, which Log4j can perform via the "Delete" action on the xref:manual/appenders.adoc#RollingFileAppender[RollingFileAppender]. +We can't emphasize it enough to not use anything, but a xref:manual/layouts.adoc#structured-logging[structured layout] to deliver your logs to an external system. +We recommend xref:manual/json-template-layout.adoc[] for this purpose: -image:DockerLogFile.png[File, "Logging to a File"] +* JSON Template Layout provides full customizability and contains several predefined layouts for popular log storage services. +* JSON is accepted by every log storage service. +* JSON is supported by logging frameworks in other languages. +This makes it possible to agree on a common log format with non-Java applications. -=== Sending Directly to a Log Forwarder via TCP +[#proxy] +=== Use a proxy for writing logs -Sending logs directly to a Log Forwarder is simple as it generally just requires that the forwarder's host and port be configured on a SocketAppender with an appropriate layout. +Most of the time it is not a good idea to write to the log storage system directly, but instead delegate that task to a proxy. +This design decouples applications' log target and the log storage system and, as a result, effectively enables each to evolve independently and reliably (i.e., without downtime). +For instance, this will allow the log storage system to scale or migrate to a new environment while proxies take care of necessary buffering and routing. -image:DockerTCP.png[TCP, "Application Logging to a Forwarder via TCP"] +This proxy can appear in many forms, for instance: -=== Sending Directly to a Log Aggregator via TCP +* *Console* can act as a proxy. +Logs written to console can be consumed by an external service. +For example, https://12factor.net/logs[The Twelve-Factor App] and https://kubernetes.io/docs/concepts/cluster-administration/logging/[Kubernetes Logging Architecture] recommends this approach. -Similar to sending logs to a forwarder, logs can also be sent to a cluster of aggregators. -However, setting this up is not as simple since, to be highly available, a cluster of aggregators must be used. -However, the SocketAppender currently can only be configured with a single host and port. -To allow for failover if the primary aggregator fails the SocketAppender must be enclosed in a xref:manual/appenders.adoc#FailoverAppender[FailoverAppender], which would also have the secondary aggregator configured. -Another option is to have the SocketAppender point to a highly available proxy that can forward to the Log Aggregator. +* A *library* can act as proxy. +It can tap into the logging API and forward it to an external service. +For instance, https://docs.datadoghq.com/logs/log_collection/java[Datadog's Java Log Collector] uses this mechanism. -If the log aggregator used is Apache Flume (or similar) the Appenders for these support -being configured with a list of hosts and ports so high availability is not an issue. +* An external *service* can act as a proxy, which applications can write logs to. +For example, you can write to https://www.elastic.co/logstash[Logstash], a https://kubernetes.io/docs/concepts/cluster-administration/logging/#sidecar-container-with-a-logging-agent[Kubernetes logging agent sidecar], or a https://redis.io/glossary/redis-queue/[Redis queue] over a socket. -image:LoggerAggregator.png[Aggregator, "Application Logging to an Aggregator via TCP"] +What to use as a proxy depends on your deployment environment. +You should consult to your colleagues if there is already an established logging proxy convention. +Otherwise, we strongly encourage you to establish one in collaboration with your system administrators and architects. -[#ELK] -== Logging using Elasticsearch, Logstash, and Kibana +[#appender] +=== Configure your appender correctly -There are various approaches with different trade-offs for ingesting logs into -an ELK stack. Here we will briefly cover how one can forward Log4j generated -events first to Logstash and then to Elasticsearch. +Once you decide on <<proxy,the log proxy>> to use, the choice of appender pretty much becomes self-evident. +Nevertheless, there are some tips we recommend you to practice: -=== Log4j Configuration +* *For writing to console*, use a xref:manual/appenders.adoc#ConsoleAppender[Console Appender] and make sure to configure its `direct` attribute to `true` for the maximum efficiency. -==== JsonTemplateLayout +* *For writing to an external service*, use a xref:manual/appenders.adoc#SocketAppender[Socket Appender] and make sure to configure the null delimiter of the associated layout. +For instance, see xref:manual/json-template-layout.adoc#plugin-attr-nullEventDelimiterEnabled[the `nullEventDelimiterEnabled` configuration attribute of JSON Template Layout]. -Log4j provides a multitude of JSON generating layouts. In particular, JSON -Template Layout allows full schema -customization and bundles ELK-specific layouts by default, which makes it a -great fit for the bill. Using the EcsLayout template as shown below will generate data in Kibana where -the message displayed exactly matches the message passed to Log4j and most of the event attributes, including -any exceptions, are present as individual attributes that can be displayed. Note, however that stack traces -will be formatted without newlines. +[#file] +=== Avoid writing to files -[source,xml] ----- -<Socket name="Logstash" - host="${sys:logstash.host}" - port="12345" - protocol="tcp" - bufferedIo="true"> - <JsonTemplateLayout eventTemplateUri="classpath:EcsLayout.json"> - <EventTemplateAdditionalField key="containerId" value="${docker:containerId:-}"/> - <EventTemplateAdditionalField key="application" value="${lower:${spring:spring.application.name:-spring}}"/> - <EventTemplateAdditionalField key="kubernetes.serviceAccountName" value="${k8s:accountName:-}"/> - <EventTemplateAdditionalField key="kubernetes.containerId" value="${k8s:containerId:-}"/> - <EventTemplateAdditionalField key="kubernetes.containerName" value="${k8s:containerName:-}"/> - <EventTemplateAdditionalField key="kubernetes.host" value="${k8s:host:-}"/> - <EventTemplateAdditionalField key="kubernetes.labels.app" value="${k8s:labels.app:-}"/> - <EventTemplateAdditionalField key="kubernetes.labels.pod-template-hash" value="${k8s:labels.podTemplateHash:-}"/> - <EventTemplateAdditionalField key="kubernetes.master_url" value="${k8s:masterUrl:-}"/> - <EventTemplateAdditionalField key="kubernetes.namespaceId" value="${k8s:namespaceId:-}"/> - <EventTemplateAdditionalField key="kubernetes.namespaceName" value="${k8s:namespaceName:-}"/> - <EventTemplateAdditionalField key="kubernetes.podID" value="${k8s:podId:-}"/> - <EventTemplateAdditionalField key="kubernetes.podIP" value="${k8s:podIp:-}"/> - <EventTemplateAdditionalField key="kubernetes.podName" value="${k8s:podName:-}"/> - <EventTemplateAdditionalField key="kubernetes.imageId" value="${k8s:imageId:-}"/> - <EventTemplateAdditionalField key="kubernetes.imageName" value="${k8s:imageName:-}"/> - </JsonTemplateLayout> -</Socket> ----- +As explained in <<motivation>>, in a service-oriented architecture, log files are -==== Gelf Template +* Difficult to maintain – writable volumes must be mounted to the runtime (container, VM, etc.), rotated, and monitored for excessive usage +* Difficult to use – multiple files need to be manually combined while troubleshooting, no central navigation point +* Difficult to interoperate – each application needs to be individually configured to produce the same structured log output to enable interleaving of logs from multiple sources while troubleshooting distributed issues -The JsonTemplateLayout can also be used to generate JSON that matches the GELF specification which can format -the message attribute using a pattern in accordance with the PatternLayout. For example, the following -template, named EnhancedGelf.json, can be used to generate GELF-compliant data that can be passed to Logstash. -With this template the message attribute will include the thread id, level, specific ThreadContext attributes, -the class name, method name, and line number as well as the message. If an exception is included it will also -be included with newlines. This format follows very closely what you would see in a typical log file on disk -using the PatternLayout but has the additional advantage of including the attributes as separate fields that -can be queried. +In short, *we don't recommend writing logs to files*. -[source,json] ----- -{ - "version": "1.1", - "host": "${hostName}", - "short_message": { - "$resolver": "message", - "stringified": true - }, - "full_message": { - "$resolver": "message", - "pattern": "[%t] %-5p %X{requestId, sessionId, loginId, userId, ipAddress, corpAcctNumber} %C{1.}.%M:%L - %m", - "stringified": true - }, - "timestamp": { - "$resolver": "timestamp", - "epoch": { - "unit": "secs" - } - }, - "level": { - "$resolver": "level", - "field": "severity", - "severity": { - "field": "code" - } - }, - "_logger": { - "$resolver": "logger", - "field": "name" - }, - "_thread": { - "$resolver": "thread", - "field": "name" - }, - "_mdc": { - "$resolver": "mdc", - "flatten": { - "prefix": "_" - }, - "stringified": true - } -} ----- +A common excuse for writing logs to files is to use them as a buffer prior to ingesting them either to the logging system proxy or to the log storage system. +Even in this case, most of the time it is sign of bad architecture design, and costs extra resources and efficiency degradation. Review Comment: Isn't this what proxies are doing? Flume is out of shape, but it has such a functionality (as far as I understand it is the only functionality worth splitting). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
