Copilot commented on code in PR #4077:
URL: https://github.com/apache/logging-log4j2/pull/4077#discussion_r2983764216
##########
log4j-core/pom.xml:
##########
@@ -66,7 +66,7 @@
org.apache.commons.compress.*;resolution:=optional,
org.apache.commons.csv;resolution:=optional,
org.apache.kafka.*;resolution:=optional,
Review Comment:
The BND import pattern `org.codehaus.stax2.*` does not include the base
package `org.codehaus.stax2`, but this module uses
`org.codehaus.stax2.XMLStreamWriter2` (and other types in the base package).
This can cause OSGi resolution failures. Include both `org.codehaus.stax2` and
`org.codehaus.stax2.*` (or use a pattern that matches both) in
`bnd-extra-package-options`.
```suggestion
org.apache.kafka.*;resolution:=optional,
org.codehaus.stax2;resolution:=optional,
```
##########
log4j-core/src/main/java/org/apache/logging/log4j/core/jackson/Log4jXmlObjectMapper.java:
##########
@@ -41,7 +51,144 @@ public Log4jXmlObjectMapper() {
* Create a new instance using the {@link Log4jXmlModule}.
*/
public Log4jXmlObjectMapper(final boolean includeStacktrace, final boolean
stacktraceAsString) {
- super(new Log4jXmlModule(includeStacktrace, stacktraceAsString));
+ super(new SanitizingXmlFactory(), new
Log4jXmlModule(includeStacktrace, stacktraceAsString));
this.setSerializationInclusion(JsonInclude.Include.NON_EMPTY);
}
+
+ /**
+ * Writer that sanitizes text to be valid XML 1.0 by replacing disallowed
code points with the replacement character (U+FFFD).
+ */
+ private static final class SanitizingWriter extends StreamWriter2Delegate {
+
+ private static final char REPLACEMENT_CHAR = '\uFFFD';
+
+ SanitizingWriter(final XMLStreamWriter2 delegate) {
+ super(delegate);
+ setParent(delegate);
+ }
+
+ @Override
+ public void writeAttribute(final String localName, final String value)
throws XMLStreamException {
+ super.writeAttribute(localName, sanitizeXml10(value));
+ }
+
+ @Override
+ public void writeAttribute(final String namespaceURI, final String
localName, final String value)
+ throws XMLStreamException {
+ super.writeAttribute(namespaceURI, localName,
sanitizeXml10(value));
+ }
+
+ @Override
+ public void writeAttribute(
+ final String prefix, final String namespaceURI, final String
localName, final String value)
+ throws XMLStreamException {
+ super.writeAttribute(prefix, namespaceURI, localName,
sanitizeXml10(value));
+ }
+
+ @Override
+ public void writeCData(String text) throws XMLStreamException {
+ super.writeCData(sanitizeXml10(text));
+ }
+
+ @Override
+ public void writeCData(char[] text, int start, int len) throws
XMLStreamException {
+ super.writeCData(sanitizeXml10(text, start, len));
+ }
+
+ @Override
+ public void writeCharacters(final String text) throws
XMLStreamException {
+ super.writeCharacters(sanitizeXml10(text));
+ }
+
+ @Override
+ public void writeCharacters(final char[] text, final int start, final
int len) throws XMLStreamException {
+ super.writeCharacters(sanitizeXml10(text, start, len));
+ }
+
+ @Override
+ public void writeComment(String text) throws XMLStreamException {
+ super.writeComment(sanitizeXml10(text));
+ }
+
+ private static String sanitizeXml10(final String input) {
+ if (input == null) {
+ return null;
+ }
+ final int length = input.length();
+ // Only create a new string if we find an invalid code point.
+ // In the common case, this should avoid unnecessary allocations.
+ for (int i = 0; i < length; ) {
+ final int cp = input.codePointAt(i);
+ if (!isValidXml10(cp)) {
+ final StringBuilder out = new StringBuilder(length);
+ out.append(input, 0, i);
+ appendSanitized(input, i, length, out);
+ return out.toString();
+ }
+ i += Character.charCount(cp);
+ }
+ return input;
+ }
+
+ private static String sanitizeXml10(final char[] input, final int
start, final int len) {
+ return sanitizeXml10(new String(input, start, len));
+ }
+
+ private static void appendSanitized(final String input, int i, final
int length, final StringBuilder out) {
+ while (i < length) {
+ final int cp = input.codePointAt(i);
+ out.appendCodePoint(isValidXml10(cp) ? cp : REPLACEMENT_CHAR);
+ i += Character.charCount(cp);
+ }
+ }
+
+ /**
+ * Checks if a code point is valid
+ *
+ * @param codePoint a code point between {@code 0} and {@link
Character#MAX_CODE_POINT}
+ * @return {@code true} if it is a valid XML 1.0 code point
+ */
+ private static boolean isValidXml10(final int codePoint) {
+ assert codePoint >= 0 && codePoint <= Character.MAX_CODE_POINT;
+ // XML 1.0 valid characters (Fifth Edition):
+ // #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] |
[#x10000-#x10FFFF]
+
+ // [#x20–#xD7FF] (placed early as a fast path for the most common
case)
+ return (codePoint >= ' ' && codePoint < Character.MIN_SURROGATE)
+ // #x9
+ || codePoint == '\t'
+ // #xA
+ || codePoint == '\n'
+ // #xD
+ || codePoint == '\r'
+ // [#xE000-#xFFFD]
+ || (codePoint > Character.MAX_SURROGATE && codePoint <=
0xFFFD)
+ // [#x10000-#x10FFFF]
+ || codePoint >= Character.MIN_SUPPLEMENTARY_CODE_POINT;
+ }
+ }
+
+ /**
+ * Factory that creates {@link SanitizingWriter} instances to ensure that
all text written to the XML output is valid XML 1.0.
+ */
+ private static final class SanitizingXmlFactory extends XmlFactory {
+
+ private static final long serialVersionUID = 1L;
+
+ @Override
+ protected XMLStreamWriter _createXmlWriter(final IOContext ctxt, final
Writer w) throws IOException {
+ return new
SanitizingWriter(Stax2WriterAdapter.wrapIfNecessary(super._createXmlWriter(ctxt,
w)));
+ }
+
+ @Override
+ protected XMLStreamWriter _createXmlWriter(final IOContext ctxt, final
OutputStream out) throws IOException {
+ return new
SanitizingWriter(Stax2WriterAdapter.wrapIfNecessary(super._createXmlWriter(ctxt,
out)));
+ }
+
+ @Override
+ public XmlFactory copy() {
+ _checkInvalidCopy(SanitizingXmlFactory.class);
+ return new SanitizingXmlFactory();
+ }
Review Comment:
`SanitizingXmlFactory.copy()` currently returns a brand new factory without
copying configuration/state from the original `XmlFactory`. Jackson calls
`JsonFactory/XmlFactory.copy()` when copying mappers/writers, and returning a
default instance can silently drop feature flags and factory configuration.
Implement `copy()` so it copies from `this` (typically by adding a copy
constructor and delegating to `super(src, codec)`/equivalent) instead of
returning a fresh default factory.
##########
log4j-core/src/main/java/org/apache/logging/log4j/core/jackson/Log4jXmlObjectMapper.java:
##########
@@ -41,7 +51,144 @@ public Log4jXmlObjectMapper() {
* Create a new instance using the {@link Log4jXmlModule}.
*/
public Log4jXmlObjectMapper(final boolean includeStacktrace, final boolean
stacktraceAsString) {
- super(new Log4jXmlModule(includeStacktrace, stacktraceAsString));
+ super(new SanitizingXmlFactory(), new
Log4jXmlModule(includeStacktrace, stacktraceAsString));
this.setSerializationInclusion(JsonInclude.Include.NON_EMPTY);
}
+
+ /**
+ * Writer that sanitizes text to be valid XML 1.0 by replacing disallowed
code points with the replacement character (U+FFFD).
+ */
+ private static final class SanitizingWriter extends StreamWriter2Delegate {
+
+ private static final char REPLACEMENT_CHAR = '\uFFFD';
+
+ SanitizingWriter(final XMLStreamWriter2 delegate) {
+ super(delegate);
+ setParent(delegate);
+ }
+
+ @Override
+ public void writeAttribute(final String localName, final String value)
throws XMLStreamException {
+ super.writeAttribute(localName, sanitizeXml10(value));
+ }
+
+ @Override
+ public void writeAttribute(final String namespaceURI, final String
localName, final String value)
+ throws XMLStreamException {
+ super.writeAttribute(namespaceURI, localName,
sanitizeXml10(value));
+ }
+
+ @Override
+ public void writeAttribute(
+ final String prefix, final String namespaceURI, final String
localName, final String value)
+ throws XMLStreamException {
+ super.writeAttribute(prefix, namespaceURI, localName,
sanitizeXml10(value));
+ }
+
+ @Override
+ public void writeCData(String text) throws XMLStreamException {
+ super.writeCData(sanitizeXml10(text));
+ }
+
+ @Override
+ public void writeCData(char[] text, int start, int len) throws
XMLStreamException {
+ super.writeCData(sanitizeXml10(text, start, len));
+ }
+
+ @Override
+ public void writeCharacters(final String text) throws
XMLStreamException {
+ super.writeCharacters(sanitizeXml10(text));
+ }
+
+ @Override
+ public void writeCharacters(final char[] text, final int start, final
int len) throws XMLStreamException {
+ super.writeCharacters(sanitizeXml10(text, start, len));
+ }
+
+ @Override
+ public void writeComment(String text) throws XMLStreamException {
+ super.writeComment(sanitizeXml10(text));
+ }
+
+ private static String sanitizeXml10(final String input) {
+ if (input == null) {
+ return null;
+ }
+ final int length = input.length();
+ // Only create a new string if we find an invalid code point.
+ // In the common case, this should avoid unnecessary allocations.
+ for (int i = 0; i < length; ) {
+ final int cp = input.codePointAt(i);
+ if (!isValidXml10(cp)) {
+ final StringBuilder out = new StringBuilder(length);
+ out.append(input, 0, i);
+ appendSanitized(input, i, length, out);
+ return out.toString();
+ }
+ i += Character.charCount(cp);
+ }
+ return input;
+ }
+
+ private static String sanitizeXml10(final char[] input, final int
start, final int len) {
+ return sanitizeXml10(new String(input, start, len));
+ }
Review Comment:
`sanitizeXml10(char[], start, len)` always allocates a new `String` even
when the input contains only valid XML characters, which undermines the “avoid
unnecessary allocations” intent and can add overhead on hot logging paths.
Consider implementing a char[]/code-point scan that only allocates when an
invalid code point is encountered (mirroring the `String` path).
##########
log4j-core-test/src/test/java/org/apache/logging/log4j/core/layout/XmlLayoutTest.java:
##########
@@ -397,4 +403,87 @@ public void testIncludeNullDelimiterFalse() {
final String str =
layout.toSerializable(LogEventFixtures.createLogEvent());
assertFalse(str.endsWith("\0"));
}
+
+ // In Java 11 this can be replaced with Character.toString
+ private static String codePointToString(final int codePoint) {
+ return new String(Character.toChars(codePoint));
+ }
Review Comment:
`codePointToString` is unused in this test class. If it’s not needed for the
assertions/data providers, please remove it to avoid dead code in the test
suite.
```suggestion
```
##########
log4j-core-test/src/test/java/org/apache/logging/log4j/core/layout/XmlLayoutTest.java:
##########
@@ -46,13 +47,18 @@
import org.apache.logging.log4j.core.util.KeyValuePair;
import org.apache.logging.log4j.message.SimpleMessage;
import org.apache.logging.log4j.spi.AbstractLogger;
+import org.apache.logging.log4j.spi.DefaultThreadContextStack;
import org.apache.logging.log4j.test.junit.ThreadContextRule;
+import org.apache.logging.log4j.util.StringMap;
+import org.assertj.core.api.Assertions;
import org.junit.AfterClass;
import org.junit.Assert;
import org.junit.BeforeClass;
import org.junit.Rule;
import org.junit.Test;
import org.junit.experimental.categories.Category;
+import org.junit.jupiter.params.ParameterizedTest;
+import org.junit.jupiter.params.provider.ValueSource;
Review Comment:
This test class mixes JUnit 4 (`@Test`, `@BeforeClass`, `@Rule`) with JUnit
Jupiter parameterized tests. Under the JUnit Platform, the Vintage engine will
run the JUnit 4 tests (including `@BeforeClass/@AfterClass`), while the Jupiter
engine will run the new `@ParameterizedTest` methods *without* executing the
JUnit 4 lifecycle methods or `@Rule`, which can lead to global state/setup
differences and flaky behavior. Prefer either (a) rewriting the new tests as
JUnit 4 tests, or (b) moving them to a separate Jupiter-only test class and
using Jupiter lifecycle/extension equivalents (`@BeforeAll/@AfterAll`,
`@RegisterExtension`) as needed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]