Yes, that is a good idea. Probably XML and YAML as well. Ralph
> On Jul 22, 2017, at 11:21 AM, Mikael Ståldal <mi...@apache.org> wrote: > > But here we have a concrete use case: a third-party tool, Lilith, needs to > parse log events from our JsonLayout. (Our SocketServer in log4j-server have > the same need, and it uses this class.) > > Should we provide a public API in log4j-core for doing so, which Lilith, our > SocketServer, and possibly others (Chainsaw maybe?) can use? Maybe such > public API should be a facade around this which hides the fact that we use > Jackson. > > > On 2017-07-22 19:04, Gary Gregory wrote: >> Hi All, >> Just like anything in log4j-core, we try not break binary compatibility >> when we can but we do not bend over backwards like we do for log4j-api. >> This class is public probably because Jackson requires it to be, otherwise >> I would have made it package private. The fact that we use Jackson for JSON >> IO is an implementation detail IMO, not an extension point. If anyone wants >> to hard-code a reference to log4j-core's guts then I think its fair to day >> that this comes with caveats. >> Gary >> On Sat, Jul 22, 2017 at 7:27 AM, Jörn Huxhorn <jhuxh...@googlemail.com> >> wrote: >>> Well… it says >>> >>> /** >>> * A Jackson JSON {@link ObjectMapper} initialized for Log4j. >>> * <p> >>> * <em>Consider this class private.</em> >>> * </p> >>> */ >>> >>> so I wanted to holler about how to continue. >>> >>> I’ll give it a shot. >>> >>> >>> On 22. July 2017 at 15:06:28, Mikael Ståldal (mi...@apache.org) wrote: >>>> Would it work to use >>>> org.apache.logging.log4j.core.jackson.Log4jJsonObjectMapper, which is >>>> public? >>>> >>>> See here how it is used: >>>> >>>> https://github.com/apache/logging-log4j-tools/blob/ >>> master/log4j-server/src/main/java/org/apache/logging/log4j/server/ >>> JsonInputStreamLogEventBridge.java >>>> >>>> >>>> On 2017-07-22 13:47, Jörn Huxhorn wrote: >>>>> It seems like I don’t get access to the log4j2 ObjectMapper or Jackson >>> Module used for >>>> JSON serialization, i.e. I can’t do something like this: >>>>> new JacksonFactory.JSON(encodeThreadContextAsList, includeStacktrace, >>> false) >>>>> >>>>> It would make sense to reuse your ObjectMapper/Module in my code >>> instead of reimplementing >>>> them to prevent future compatibility issues down the road. >>>>> >>>>> >>>>> On 19. July 2017 at 23:04:26, Jörn Huxhorn (jhuxh...@googlemail.com) >>> wrote: >>>>>> I’ll give this a shot over the next view days. >>>>>> >>>>>> Thanks! >>>>>> >>>>>> >>>>>> On 19. July 2017 at 21:33:49, Mikael Ståldal (mi...@apache.org) >>> wrote: >>>>>>> JsonLayout (and XmlLayout and YamlLayout) now supports 0-byte >>>>>>> termination of log events. Will be part of the upcoming 2.9 release. >>>>>>> >>>>>>> >>>>>>> On 2017-07-18 11:48, Jörn Huxhorn wrote: >>>>>>>> A general event header and footer would probably be overkill. >>>>>>>> >>>>>>>> I’m using the following interface in Lilith: >>>>>>>> >>>>>>>> public interface WriteByteStrategy { >>>>>>>> void writeBytes(DataOutputStream dataOutputStream, byte[] bytes) >>>>>>>> throws IOException; >>>>>>>> } >>>>>>>> >>>>>>>> One implementation writes an int with the length of bytes before >>> “bytes", the other >>>>>>> implementation writes a 0-byte after “bytes". >>>>>>>> >>>>>>>> The problem with that approach is that it requires the intermediate >>> “bytes” array >>>>>> and >>>>>>> is thus probably incompatible with your zero garbage approach. >>>>>>>> >>>>>>>> I didn’t try end-of-line (\r\n) mainly because I first wanted to >>> discuss this with >>>>>> you. >>>>>>> I would also have to implement a new receiver type whereas I could >>> just reuse an existing >>>>>>> one in case of 0-byte termination. >>>>>>>> >>>>>>>> I’d prefer using 0-byte termination since it would be less fragile >>> than end-of-line. >>>>>>> The end-of-line approach would break if compact=“false” was set >>> erroneously while >>>>>>> you’d still have the option of using either compact=“true” or >>> compact=“false” in >>>> case >>>>>>> of 0-byte termination. 0-byte termination is also slightly more >>> efficient while >>>> reading >>>>>>> since the splitting can operate on bytes instead of chars or Strings >>> - another reason >>>>>>> why I didn’t just implement it and instead wanted to discuss this >>> with you first. >>>>>>>> >>>>>>>> Joern >>>>>>>> >>>>>>>> >>>>>>>> On 18. July 2017 at 03:26:32, Ralph Goers ( >>> ralph.go...@dslextreme.com) wrote: >>>>>>>>> Right now that is all handled by the specific layouts. For >>> example, by default the >>>>>> RFC5424Layout >>>>>>>>> doesn’t append newlines so when writing to a file they will all be >>> on the same “line”, >>>>>>> but >>>>>>>>> it has an option to append one if needed. Doing the same would be >>> another option for >>>>>> the >>>>>>>>> JSONLayout. >>>>>>>>> >>>>>>>>> I’d be ok with event header and footers but ONLY if they have zero >>> overhead when not >>>>>> present. >>>>>>>>> IOW, when the Layout is initialized it would have to wire in the >>> appropriate encode >>>>>>> method. >>>>>>>>> >>>>>>>>> Ralph >>>>>>>>> >>>>>>>>>> On Jul 17, 2017, at 3:06 PM, Gary Gregory wrote: >>>>>>>>>> >>>>>>>>>> Do we want a general event header and footer then? >>>>>>>>>> >>>>>>>>>> Gary >>>>>>>>>> >>>>>>>>>> On Jul 17, 2017 14:43, "Ralph Goers" wrote: >>>>>>>>>> >>>>>>>>>>> No. A Footer is only used at end of file. He needs to know how >>> long each >>>>>>>>>>> event is or when it is the start of a new event. >>>>>>>>>>> >>>>>>>>>>> Ralph >>>>>>>>>>> >>>>>>>>>>>> On Jul 17, 2017, at 12:32 PM, Gary Gregory >>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Can't you use a footer for any terminator you wish? >>>>>>>>>>>> >>>>>>>>>>>> Gary >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Jul 17, 2017 at 12:13 PM, Mikael Ståldal >>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi. >>>>>>>>>>>>> >>>>>>>>>>>>> (Moving this discussion to logging dev mailing list.) >>>>>>>>>>>>> >>>>>>>>>>>>> Have you tried to use: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Then each log event will be terminated by end-of-line (\r\n). >>>>>>>>>>>>> >>>>>>>>>>>>> I think it would be easy to implement 0-byte terminated log >>> events in >>>>>>>>>>>>> JsonLayout, and that would make sense since we have >>> implemented support >>>>>>>>>>> for >>>>>>>>>>>>> that in GelfLayout. Created a JIRA issue for it: >>>>>>>>>>>>> https://issues.apache.org/jira/browse/LOG4J2-1981 >>>>>>>>>>>>> >>>>>>>>>>>>> As for other tools, the only receivers for Log4j 2 >>> SerializedLayout we >>>>>>>>>>>>> know are Log4j's own SocketServer and Lilith. Chainsaw >>> currently only >>>>>>>>>>>>> support Log4j 1 SerializedLayout. Log4j's own SocketServer >>> support >>>>>>>>>>>>> JsonLayout and XmlLayout as well, and we are changing it to use >>>>>>>>>>> JsonLayout >>>>>>>>>>>>> by default. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 2017-07-17 15:01, Joern Huxhorn wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Mikael, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I've just taken a look at the JsonLayout and see no way to >>> parse its >>>>>>>>>>>>>> output event by event. >>>>>>>>>>>>>> >>>>>>>>>>>>>> It's semi-fine for files because then it will result in a >>> well-formed >>>>>>>>>>>>>> array at the root level that can be read in one big swoop. It >>> isn't >>>>>>>>>>>>>> really ideal for that use case either since it means that the >>> whole >>>>>>>>>>>>>> array containing all events needs to be read into memory in >>> one piece, >>>>>>>>>>>>>> causing OOM in case of huge files. >>>>>>>>>>>>>> >>>>>>>>>>>>>> But in Lilith, I really need to process events one by one. >>> This is >>>>>>>>>>>>>> absolutely crucial. >>>>>>>>>>>>>> >>>>>>>>>>>>>> The only way to achieve this would essentially require me to >>>>>>>>>>>>>> re-implement a JSON parser that can cope with data like this. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I implemented a similar kludge to support reading of log4j >>> xml files >>>>>>>>>>>>>> (see >>>>>>>>>>>>>> https://github.com/huxi/lilith/blob/master/log4j/log4j-xml/ >>>>>>>>>>>>>> src/main/java/de/huxhorn/lilith/log4j/xml/ >>> Log4jImportCallable.java >>>>>>>>>>>>>> ) but this was way easier since i could simply search for >>>>>>>>>>>>>> "" to find the position at which I could split the stream >>>>>>>>>>>>>> of events into a valid XML document for every single event . >>> This was >>>>>>>>>>>>>> still kind of nasty but doable, especially since processing an >>>>>>>>>>> "offline" >>>>>>>>>>>>>> XML file doesn't have the same performance restriction as >>> handling of >>>>>>>>>>>>>> live events. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I wouldn't have the luxury of such a unique split signal in >>> JSON, >>>>>>>>>>> though. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I would need to detect the "}" at the level of the root array >>> which >>>>>>>>>>>>>> would not simply involve counting of opening and closing >>> brackets but >>>>>>>>>>>>>> also ignoring of brackets inside of strings. This would boil >>> down to >>>>>>>>>>>>>> implementing a custom JSON reader and I won't be doing this. >>>>>>>>>>>>>> >>>>>>>>>>>>>> This isn't just me being lazy either. Such an implementation >>> wouldn't >>>>>>>>>>> be >>>>>>>>>>>>>> resource (CPU/memory) friendly anyway. >>>>>>>>>>>>>> >>>>>>>>>>>>>> In my own JSON receiver using my own JSON format there are >>> two ways to >>>>>>>>>>>>>> send those events: >>>>>>>>>>>>>> - write an int containing the amount of bytes representing >>> the event, >>>>>>>>>>>>>> then the bytes of the event. This type of events also supports >>>>>>>>>>>>>> compression. >>>>>>>>>>>>>> - write the bytes of the event, followed by a 0-byte. This >>> works fine >>>>>>>>>>>>>> and JavaScript/ActionScript is able to produce events like >>> that while >>>>>>>>>>>>>> they are unable (or were unable 7 years ago) to count the >>> bytes on >>>>>>>>>>> their >>>>>>>>>>>>>> own. This type of events only supports plain text like JSON >>> or XML and >>>>>>>>>>>>>> no compression since compressed events could contain the >>> 0-byte while >>>>>>>>>>>>>> XML and JSON both won't allow it. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Both are essentially "binary" formats because of either the >>> raw ints or >>>>>>>>>>>>>> the 0-bytes. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I'm not sure why you deprecate SerializedLayout since the >>> security >>>>>>>>>>>>>> issues arise only while deserializing the events, not while >>> serializing >>>>>>>>>>>>>> them. Is this just meant to educate the user about the issue? >>>>>>>>>>>>>> >>>>>>>>>>>>>> I fixed the remote code execution exploit by implementing >>>>>>>>>>>>>> https://github.com/huxi/lilith/blob/master/lilith-engine/ >>>>>>>>>>>>>> src/main/java/de/huxhorn/lilith/engine/impl/eventproduc >>>>>>>>>>>>>> er/WhitelistObjectInputStream.java >>>>>>>>>>>>>> >>>>>>>>>>>>>> This still doesn't fix DoS scenarios (endless loops or OOM in >>> case of >>>>>>>>>>>>>> malicious data) but is close enough for me, mainly because >>> fixing the >>>>>>>>>>>>>> DoS issues would really have to be handled by Java, not by >>> user code. >>>>>>>>>>>>>> Manually re-implementing deserialization would likely be >>> fragile and >>>>>>>>>>>>>> error prone, possibly even introducing other security issues >>> on its >>>>>>>>>>> own. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I'd be interested how other tools are tackling this issue. Is >>> Chainsaw >>>>>>>>>>>>>> officially dead or are they implementing receivers for event >>> streams >>>>>>>>>>>>>> like this? >>>>>>>>>>>>>> >>>>>>>>>>>>>> I'm totally fine with implementing a new receiver as a >>> replacement for >>>>>>>>>>>>>> log4j2 SerializedLayout (even though I won't be able to get >>> rid of the >>>>>>>>>>>>>> Serializable receiver anyway) but the current JsonLayout just >>> isn't up >>>>>>>>>>>>>> to the job at the moment. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Feel free to share this mail with your fellow developers. I'm >>> open for >>>>>>>>>>>>>> suggestions. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I hope this doesn't come across in a harsh way. It isn't >>> meant to be. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>> Jörn. >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>> >>> > >