Re: [PR] Revamp `cloud.adoc` for service-oriented architectures (logging-log4j2)

via GitHub Mon, 08 Jul 2024 03:55:04 -0700


ppkarwasz commented on code in PR #2715:
URL: https://github.com/apache/logging-log4j2/pull/2715#discussion_r1668418867



##########
src/site/antora/modules/ROOT/pages/manual/cloud.adoc:
##########
@@ -15,549 +15,267 @@
  limitations under the License.
 ////
 
-= Using Log4j in Cloud Enabled Applications
-
-== The Twelve-Factor Application
-
-The Logging Guidelines for https://12factor.net/logs[The Twelve-Factor App] 
state that all logs should be routed
-unbuffered to stdout. Since this is the least common denominator it is 
guaranteed to work for all applications. However,
-as with any set of general guidelines, choosing the least common denominator 
approach comes at a cost. Some of the costs
-in Java applications include:
-
-. Java stack traces are multi-line log messages. The standard docker log 
driver cannot handle these properly. See
-https://github.com/moby/moby/issues/22920[Docker Issue #22920] which was 
closed with the message "Don't Care".
-Solutions for this are to:
-.. Use a docker log driver that does support multi-line log message,
-.. Use a logging format that does not produce multi-line messages,
-.. Log from Log4j directly to a logging forwarder or aggregator and bypass the 
docker logging driver.
-. When logging to stdout in Docker, log events pass through Java's standard 
output handling which is then directed
-to the operating system so that the output can be piped into a file. The 
overhead of all this is measurably slower
-than just writing directly to a file as can be seen in these benchmark results 
where logging
-to stdout is 16-20 times slower over repeated runs than logging directly to 
the file. The results below were obtained by
-running the 
https://github.com/apache/logging-log4j2/blob/release-2.x/log4j-perf-test/src/main/java/org/apache/logging/log4j/perf/jmh/OutputBenchmark.java[Output
 Benchmark]
-on a 2018 MacBook Pro with a 2.9GHz Intel Core i9 processor and a 1TB SSD. 
However, these results alone would not be
-enough to argue against writing to the standard output stream as they only 
amount to about 14-25 microseconds
-per logging call vs 1.5 microseconds when writing to the file.
-+
-[source]
-----
-    Benchmark                  Mode  Cnt       Score       Error  Units
-    OutputBenchmark.console   thrpt   20   39291.885 ±  3370.066  ops/s
-    OutputBenchmark.file      thrpt   20  654584.309 ± 59399.092  ops/s
-    OutputBenchmark.redirect  thrpt   20   70284.576 ±  7452.167  ops/s
-----
-. When performing audit logging using a framework such as log4j-audit 
guaranteed delivery of the audit events
-is required. Many of the options for writing the output, including writing to 
the standard output stream, do
-not guarantee delivery. In these cases the event must be delivered to a 
"forwarder" that acknowledges receipt
-only when it has placed the event in durable storage, such as what 
https://flume.apache.org/[Apache Flume] will do.
+= Integrating with service-oriented architectures
 
-== Logging Approaches
+In this page we will share certain <<best-practices,best practices>> you can 
employ in your applications using Log4j Core to integrate them with 
service-oriented architectures.
+While doing so, we will also try to share guides on some popular scenarios.
 
-All the solutions discussed on this page are predicated with the idea that log 
files cannot permanently
-reside on the file system and that all log events should be routed to one or 
more log analysis tools that will
-be used for reporting and alerting. There are many ways to forward and collect 
events to be sent to the
-log analysis tools.
+[#motivation]
+== Motivation
 
-Note that any approach that bypasses Docker's logging drivers requires Log4j's
-https://logging.apache.org/log4j/2.x/manual/lookups.html#DockerLookup[Docker 
Lookup] to allow Docker attributes to be injected into the log events.
+Most modern software is deployed in 
https://en.wikipedia.org/wiki/Service-oriented_architecture[service-oriented 
architectures].
+This is a very broad domain and can be realized in an amazingly large number 
of ways.
+Nevertheless, they all redefine the notion of an application:
 
-=== Logging to the Standard Output Stream
+* Deployed in *multiple instances*
+* Situated in *multiple locations*; either in the same rack, or in different 
data centers located in different continents
+* Hosted by *multiple platforms*; hardware, virtual machine, container, etc.
+* *Polyglot*; a product of multiple programming languages
+* *Scaled* on demand; instances come and go in time
 
-As discussed above, this is the recommended 12-Factor approach for 
applications running in a docker container.
-The Log4j team does not recommend this approach for performance reasons.
+Naturally, logging systems also evolved to accommodate these needs.
+In particular, the old practice of _"monoliths writing logs to files rotated 
daily"_ has changed in two major angles:
 
-image:DockerStdout.png[Stdout, "Application Logging to the Standard Output 
Stream"]
+Application delivers logs differently::
 
-=== Logging to the Standard Output Stream with the Docker Fluentd Logging 
Driver
+Applications no longer write logs to files, but <<structured-encoding,encode 
them structurally>>, and deliver them to an external system centrally managed.
+Most of the time this is a <<proxy,proxy>> (a library, a sidecar container, 
etc.) that takes care of discovering the log storage system and determining the 
right external service to forward the logs to.
 
-Docker provides alternate 
https://docs.docker.com/config/containers/logging/configure/[logging drivers],
-such as https://docs.docker.com/config/containers/logging/fluentd/[fluentd], 
that
-can be used to redirect the standard output stream to a log forwarder or log 
aggregator.
+Platform stores logs differently::
 
-When routing to a log forwarder it is expected that the forwarder will have 
the same lifetime as the
-application. If the forwarder should fail the management tools would be 
expected to also terminate
-other containers dependent on the forwarder.
+There is no longer `/var/log/tomcat/app.log` combining all logs of a monolith.
+Instead, the software runs in multiple instances, each is implemented in a 
different language, and instances get scaled (i.e., new ones get started, old 
ones get stopped) on demand.
+To accommodate this, logs are persisted on a central storage system 
(Elasticsearch, Google Cloud Logging, etc.) that allows advanced navigation and 
filtering capabilities.
 
-image:DockerFluentd.png[Docker Fluentbit, "Logging via StdOut using the Docker 
Fluentd Logging Driver to Fluent-bit"]
+Log4j Core not only adapts to this evolution, but also strives to provide the 
best in the class support for that.
+We will explore how to integrate Log4j with service-oriented architectures.
 
-As an alternative the logging drivers could be configured to route events 
directly to a logging aggregator.
-This is generally not a good idea as the logging drivers only allow a single 
host and port to be configured.
-The docker documentation isn't clear but infers that log events will be 
dropped when log events cannot be
-delivered so this method should not be used if a highly available solution is 
required.
+[#best-practices]
+== Best practices
 
-image:DockerFluentdAggregator.png[Docker Fluentd, "Logging via StdOut using 
the Docker Fluentd Logging Driver to Fluentd"]
+Independent of the service-oriented architecture you choose, there are certain 
best practices we strongly encourage you to follow:
 
-=== Logging to a File
+[#structured-encoding]
+=== Encode logs using a structured layout
 
-While this is not the recommended 12-Factor approach, it performs very well.
-However, it requires that the  application declares a volume where the log 
files will reside and then configures the log forwarder to tail  those files.
-Care must also be taken to automatically manage the disk space used for the 
logs, which Log4j  can perform via the "Delete" action on the 
xref:manual/appenders.adoc#RollingFileAppender[RollingFileAppender].
+We can't emphasize it enough to not use anything, but a 
xref:manual/layouts.adoc#structured-logging[structured layout] to deliver your 
logs to an external system.
+We recommend xref:manual/json-template-layout.adoc[] for this purpose:
 
-image:DockerLogFile.png[File, "Logging to a File"]
+* JSON Template Layout provides full customizability and contains several 
predefined layouts for popular log storage services.
+* JSON is accepted by every log storage service.
+* JSON is supported by logging frameworks in other languages.
+This makes it possible to agree on a common log format with non-Java 
applications.
 
-=== Sending Directly to a Log Forwarder via TCP
+[#proxy]
+=== Use a proxy for writing logs
 
-Sending logs directly to a Log Forwarder is simple as it generally just 
requires that the forwarder's host and port be configured on a SocketAppender 
with an appropriate layout.
+Most of the time it is not a good idea to write to the log storage system 
directly, but instead delegate that task to a proxy.
+This design decouples applications' log target and the log storage system and, 
as a result, effectively enables each to evolve independently and reliably 
(i.e., without downtime).
+For instance, this will allow the log storage system to scale or migrate to a 
new environment while proxies take care of necessary buffering and routing.
 
-image:DockerTCP.png[TCP, "Application Logging to a Forwarder via TCP"]
+This proxy can appear in many forms, for instance:
 
-=== Sending Directly to a Log Aggregator via TCP
+* *Console* can act as a proxy.
+Logs written to console can be consumed by an external service.
+For example, https://12factor.net/logs[The Twelve-Factor App] and 
https://kubernetes.io/docs/concepts/cluster-administration/logging/[Kubernetes 
Logging Architecture] recommends this approach.
 
-Similar to sending logs to a forwarder, logs can also be sent to a cluster of 
aggregators.
-However, setting this up is not as simple since, to be highly available, a 
cluster of aggregators must be used.
-However, the SocketAppender currently can only be configured with a single 
host and port.
-To allow  for failover if the primary aggregator fails the SocketAppender must 
be enclosed in a  
xref:manual/appenders.adoc#FailoverAppender[FailoverAppender], which would also 
have the secondary aggregator configured.
-Another option is to have the SocketAppender  point to a highly available 
proxy that can forward to the Log Aggregator.
+* A *library* can act as proxy.
+It can tap into the logging API and forward it to an external service.
+For instance, https://docs.datadoghq.com/logs/log_collection/java[Datadog's 
Java Log Collector] uses this mechanism.
 
-If the log aggregator used is Apache Flume (or similar) the Appenders for 
these support
-being configured with a list of hosts and ports so high availability is not an 
issue.
+* An external *service* can act as a proxy, which applications can write logs 
to.
+For example, you can write to https://www.elastic.co/logstash[Logstash], a 
https://kubernetes.io/docs/concepts/cluster-administration/logging/#sidecar-container-with-a-logging-agent[Kubernetes
 logging agent sidecar], or a https://redis.io/glossary/redis-queue/[Redis 
queue] over a socket.
 
-image:LoggerAggregator.png[Aggregator, "Application Logging to an Aggregator 
via TCP"]
+What to use as a proxy depends on your deployment environment.
+You should consult to your colleagues if there is already an established 
logging proxy convention.
+Otherwise, we strongly encourage you to establish one in collaboration with 
your system administrators and architects.
 
-[#ELK]
-== Logging using Elasticsearch, Logstash, and Kibana
+[#appender]
+=== Configure your appender correctly
 
-There are various approaches with different trade-offs for ingesting logs into
-an ELK stack. Here we will briefly cover how one can forward Log4j generated
-events first to Logstash and then to Elasticsearch.
+Once you decide on <<proxy,the log proxy>> to use, the choice of appender 
pretty much becomes self-evident.
+Nevertheless, there are some tips we recommend you to practice:
 
-=== Log4j Configuration
+* *For writing to console*, use a 
xref:manual/appenders.adoc#ConsoleAppender[Console Appender] and make sure to 
configure its `direct` attribute to `true` for the maximum efficiency.
 
-==== JsonTemplateLayout
+* *For writing to an external service*, use a 
xref:manual/appenders.adoc#SocketAppender[Socket Appender] and make sure to 
configure the null delimiter of the associated layout.
+For instance, see 
xref:manual/json-template-layout.adoc#plugin-attr-nullEventDelimiterEnabled[the 
`nullEventDelimiterEnabled` configuration attribute of JSON Template Layout].
 
-Log4j provides a multitude of JSON generating layouts. In particular, JSON
-Template Layout allows full schema
-customization and bundles ELK-specific layouts by default, which makes it a
-great fit for the bill. Using the EcsLayout template as shown below will 
generate data in Kibana where
-the message displayed exactly matches the message passed to Log4j and most of 
the event attributes, including
-any exceptions, are present as individual attributes that can be displayed. 
Note, however that stack traces
-will be formatted without newlines.
+[#file]
+=== Avoid writing to files
 
-[source,xml]
-----
-<Socket name="Logstash"
-        host="${sys:logstash.host}"
-        port="12345"
-        protocol="tcp"
-        bufferedIo="true">
-  <JsonTemplateLayout eventTemplateUri="classpath:EcsLayout.json">
-    <EventTemplateAdditionalField key="containerId" 
value="${docker:containerId:-}"/>
-    <EventTemplateAdditionalField key="application" 
value="${lower:${spring:spring.application.name:-spring}}"/>
-    <EventTemplateAdditionalField key="kubernetes.serviceAccountName" 
value="${k8s:accountName:-}"/>
-    <EventTemplateAdditionalField key="kubernetes.containerId" 
value="${k8s:containerId:-}"/>
-    <EventTemplateAdditionalField key="kubernetes.containerName" 
value="${k8s:containerName:-}"/>
-    <EventTemplateAdditionalField key="kubernetes.host" value="${k8s:host:-}"/>
-    <EventTemplateAdditionalField key="kubernetes.labels.app" 
value="${k8s:labels.app:-}"/>
-    <EventTemplateAdditionalField key="kubernetes.labels.pod-template-hash" 
value="${k8s:labels.podTemplateHash:-}"/>
-    <EventTemplateAdditionalField key="kubernetes.master_url" 
value="${k8s:masterUrl:-}"/>
-    <EventTemplateAdditionalField key="kubernetes.namespaceId" 
value="${k8s:namespaceId:-}"/>
-    <EventTemplateAdditionalField key="kubernetes.namespaceName" 
value="${k8s:namespaceName:-}"/>
-    <EventTemplateAdditionalField key="kubernetes.podID" 
value="${k8s:podId:-}"/>
-    <EventTemplateAdditionalField key="kubernetes.podIP" 
value="${k8s:podIp:-}"/>
-    <EventTemplateAdditionalField key="kubernetes.podName" 
value="${k8s:podName:-}"/>
-    <EventTemplateAdditionalField key="kubernetes.imageId" 
value="${k8s:imageId:-}"/>
-    <EventTemplateAdditionalField key="kubernetes.imageName" 
value="${k8s:imageName:-}"/>
-  </JsonTemplateLayout>
-</Socket>
-----
+As explained in <<motivation>>, in a service-oriented architecture, log files 
are
 
-==== Gelf Template
+* Difficult to maintain – writable volumes must be mounted to the runtime 
(container, VM, etc.), rotated, and monitored for excessive usage
+* Difficult to use – multiple files need to be manually combined while 
troubleshooting, no central navigation point
+* Difficult to interoperate – each application needs to be individually 
configured to produce the same structured log output to enable interleaving of 
logs from multiple sources while troubleshooting distributed issues
 
-The JsonTemplateLayout can also be used to generate JSON that matches the GELF 
specification which can format
-the message attribute using a pattern in accordance with the PatternLayout. 
For example, the following
-template, named EnhancedGelf.json, can be used to generate GELF-compliant data 
that can be passed to Logstash.
-With this template the message attribute will include the thread id, level, 
specific ThreadContext attributes,
-the class name, method name, and line number as well as the message. If an 
exception is included it will also
-be included with newlines. This format follows very closely what you would see 
in a typical log file on disk
-using the PatternLayout but has the additional advantage of including the 
attributes as separate fields that
-can be queried.
+In short, *we don't recommend writing logs to files*.
 
-[source,json]
-----
-{
-    "version": "1.1",
-    "host": "${hostName}",
-    "short_message": {
-        "$resolver": "message",
-        "stringified": true
-    },
-    "full_message": {
-        "$resolver": "message",
-        "pattern": "[%t] %-5p %X{requestId, sessionId, loginId, userId, 
ipAddress, corpAcctNumber} %C{1.}.%M:%L - %m",
-        "stringified": true
-    },
-    "timestamp": {
-        "$resolver": "timestamp",
-        "epoch": {
-            "unit": "secs"
-        }
-    },
-    "level": {
-        "$resolver": "level",
-        "field": "severity",
-        "severity": {
-            "field": "code"
-        }
-    },
-    "_logger": {
-        "$resolver": "logger",
-        "field": "name"
-    },
-    "_thread": {
-        "$resolver": "thread",
-        "field": "name"
-    },
-    "_mdc": {
-        "$resolver": "mdc",
-        "flatten": {
-            "prefix": "_"
-        },
-        "stringified": true
-    }
-}
-----
+A common excuse for writing logs to files is to use them as a buffer prior to 
ingesting them either to the logging system proxy or to the log storage system.
+Even in this case, most of the time it is sign of bad architecture design, and 
costs extra resources and efficiency degradation.

Review Comment:
   What I mean is that the practice of buffering logs in logs files might still 
be valid.
   However the logic that does that probably should not be in the application 
itself, but in the sidecar service that forwards the logs from the application.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Revamp `cloud.adoc` for service-oriented architectures (logging-log4j2)

Reply via email to