Your use case sounds to me as follows: "I want to use `PatternLayout` for exchanging data between two systems and ... [it is insecure.]" (Please correct me if I am wrong.) My answer is: "Don't".
`PatternLayout` is not designed to be machine-readable. If I am not mistaken, there is not even a standard format for stack traces. Consider ones generated from exceptions containing messages with newline characters. How are you gonna deal with parsing those? Or thread names, custom levels, custom markers, etc. with a newline? My point is, don't use `PatternLayout` for exchanging data between systems. For that purpose, we recommend using structured layouts, e.g., `JsonTemplateLayout`. ELK, Splunk, Datadog, NewRelic, etc. they all accept JSON. In conclusion, I recommend you to use JTL for publishing logs to other systems. If you have `PatternLayout` [encoder?] enhancements that we can incorporate in a backward-compatible way, please share. On Tue, Oct 10, 2023 at 6:04 PM Klebanov, Vladimir <vladimir.kleba...@sap.com.invalid> wrote: > Hi Volkan, > > Let me try to clarify. The goal/usecase is not to log as an HTML document. > We are assuming a typical text-based log here. Yet, in practice, the logs > will be processed by a variety of systems, including web-based ones, which > may have various vulnerabilities. These vulnerabilities can be exploited by > attackers if they can use the log-producing application to inject various > strings into the log. > > (At this point, I would like to refer to the context paragraph of my > previous message.) > > Here is an example scenario spelled out. An application uses log4j to > produce a text log, while logging the username supplied by the user in > every login attempt. The log is ingested into Splunk (or ELK), as it often > is. An attacker can try to login with the username "<script>...", which > will appear verbatim in the text log. When this log message is rendered in > Splunk, it will appear in the HTML context. If Splunk has an XSS > vulnerability, then that piece of JS code will be executed with any of the > negative effects that an XSS may have. > > Using an HTML encoder in the log-producing application (like the one > already available in log4j) would introduce an extra layer of protection > against vulnerabilities in the log-processing systems. Yet, these log4j > encoders could be improved, as described in my previous message. > > There are many other scenarios in the same conceptual class. If an > attacker can inject newlines, they can forge (i.e., fake) log records, > regardless of how logs are processed further. If an attacker can inject > ANSI sequences, they can make some log records invisible when the log is > viewed in a tool like less. Etc., etc. > > I hope it is clearer now. Let me know if not. > > Thanks, > Vladimir > > > > -----Original Message----- > From: Volkan Yazıcı <vol...@yazi.ci> > Sent: Monday, 9 October 2023 22:29 > To: dev@logging.apache.org > Subject: Re: [log4j] Improving log4j security > > [You don't often get email from vol...@yazi.ci. Learn why this is > important at https://aka.ms/LearnAboutSenderIdentification ] > > *[I am sharing my earlier response (almost) verbatim below.]* > > I would like to address your both old and the most recent email *myself* – > that is, it only reflects my personal view, and not of the PMC. > > > A HTML-safe layout is only achieved if > > > defined akin to: > > > > > > <PatternLayout pattern="%d{HH:... > > > The definition of *HTML-safe* needs some explanation here. If you mean, the > rendering should be a valid HTML document where the input is sufficiently > escaped, then certainly the output of the `PatternLayout` configuration you > shared won't produce that. Indeed the implicit injection of the stack trace > is unexpected, yet you already garbled the HTML-safe content with the first > directives you provided. Imagine my thread name is `<html>`, etc. My point > is, if you want your layout to produce a valid HTML for each rendered log > event, you should be using `HtmlLayout`. The same applies to JSON too. You > should use `JsonTemplateLayout`, not `<PatternLayout > pattern="%enc{%m}{JSON}%n"/>`. > > > > Would Log4j be willing to improve the usability of encoding in pattern > > layouts to make it less likely for users to shoot themselves in the foot? > > > We provide best in the class support for JSON, HTML, etc. with their > associated dedicated layouts. If users insist on using `PatternLayout` for > those purposes, it feels to me somebody is stubbornly trying to pass SQL > arguments with string concatenation. > > > Nevertheless, if you have any proposals on _"improving the usability of > encoding in pattern layouts to make it less likely for users to shoot > themselves in the foot"_, you are more than welcome! The entire Log4j crew > would be happy to assist you for such contributions. > > > > I did go ahead and create a proof-of-concept encoder for > > > log4j that securely encodes exceptions without completely > > > mangling the stack traces: > > > > > > https://github.com/vlkl-sap/log4j-encoder > > > > > > There are two different implementations of the encoder with > > > different trade-offs (to be discussed). I also implemented a > > > new, more encompassing text encoder, based on URL > > > encoding, but this aspect is independent. > > > Before writing any code, would you mind helping us with the following > questions, please? > > > 1. Do you have a use case? If so, where does `HtmlLayout` fall short of > addressing it? > 2. Assuming `HtmlLayout` doesn't address your needs, can we [in a > backward-compatible manner] improve `HtmlLayout` to make it work for > you? > 3. Can we [in a backward-compatible manner] incorporate your > `PatternLayout` changes? > > Kind regards. > > > On Mon, Oct 9, 2023 at 5:24 PM Klebanov, Vladimir > <vladimir.kleba...@sap.com.invalid> wrote: > > > Thanks, Piotr. I don't know what happened to your replies (maybe the spam > > filter dropped them), but I am happy that we recovered from that now. > > > > Log injections are definitely security issues, but if you prefer to talk > > about them in the open, I will follow suit. > > > > For context: a log injection occurs when an application logs > user-supplied > > data (which is often the case). Attacker can exploit log injection to > forge > > log records and impede forensics or exploit potential vulnerabilities in > > log-processing systems. There is a variety of string classes that > attackers > > can try to inject, including newlines, ANSI sequences, Unicode direction > > markers, Unicode homographs, JavaScript, PHP, etc. > > > > Ideally, applications defend against log injection attacks by encoding > > (aka escaping) user-supplied data before logging. The specific encoding > > depends on the desired level of protection. URL-encoding, for instance, > > would protect against all of the above-mentioned attack classes, but > weaker > > encodings may be sometimes acceptable as well. > > > > A natural place to implement encoding is in the pattern layout > > configuration. Some encoding pattern converters are already available in > > log4j, but there are still gaps that I would like to help fill. I think > > there are roughly three of them: > > > > 1. The documentation should more prominently explain the issue. Today, > > most users would probably think that the following layout is HTML-safe, > > while it's not: > > <PatternLayout pattern="%d{HH:mm:ss.SSS} [%t] %-5level - > > %enc{%m}{HTML}%n"/> > > > > 2. The HTML encoder is not always sufficient. I would like to see an > > addition of a stricter one, such as a URL-encoder. > > > > 3. The current encoders encode all structured data (like the complete > > exception stacktrace) and not just the injection-prone parts (i.e., the > > exception message). This means I cannot replace the insecure layout above > > with the secure layout > > > > <PatternLayout pattern="%d{HH:mm:ss.SSS} [%t] %-5level - > > %enc{%m}{HTML} %enc{%xEx}{HTML}%n"/> > > > > without changing how logs are parsed (as the stack frames will not be > > separated by newlines anymore). > > > > I have created a PoC implementation of an improved encoder, but I would > > obviously need help to make it productive. Is anyone here interested in > > that? Questions and comments are welcome as well. > > > > Thanks, > > Vladimir > > > > > > -----Original Message----- > > From: Piotr P. Karwasz <piotr.karw...@gmail.com> > > Sent: Thursday, 5 October 2023 22:06 > > To: dev@logging.apache.org; Klebanov, Vladimir < > vladimir.kleba...@sap.com> > > Subject: Re: [log4j] Improving log4j security > > > > [You don't often get email from piotr.karw...@gmail.com. Learn why this > > is important at https://aka.ms/LearnAboutSenderIdentification ] > > > > Hi Vladimir, > > > > On Thu, 5 Oct 2023 at 21:47, Klebanov, Vladimir > > <vladimir.kleba...@sap.com.invalid> wrote: > > > I would like to contribute some code in order to make log4j usage more > > secure. I have now sent two emails to the log4j security team but did not > > receive a response. Is anybody here interested? How can we discuss this > > further? > > > > Both times (10 Aug 2023, 23:19 and 29 Aug 2023, 20:49) we sent an > > answer to your address at sap.com. > > > > Anyway the general consensus was that the issue with generating HTML > > using PatternLayout does not constitute a security problem and you can > > discuss it on this mailing list or file an issue in Github issues. > > > > Piotr > > >