[ 
https://issues.apache.org/jira/browse/LOG4J2-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remko Popma updated LOG4J2-1305:
--------------------------------
    Description: 
Logging in a binary format instead of in text can give large performance 
improvements. 

Logging text means going from a LogEvent object to formatted text, and then 
converting this text to bytes. Performance investigations with text-based 
logging formats like PatternLayout (see LOG4J2-930), and encoding Strings to 
bytes (LOG4J2-935, LOG4J2-1151) suggest that formatting and encoding text is 
expensive and imposes limits on the performance that can be achieved. 

A different approach would be to convert the LogEvent to a binary 
representation directly without creating a text representation first. This 
would result in extremely compact log files that are fast to write. The 
trade-off is that a binary log cannot easily be read in a general-purpose 
editor like VI or Notepad. A specialized tool would be necessary to either 
display or convert to human-readable form. 

This ticket proposes a simple BinaryLayout, where each LogEvent is logged in a 
binary format.

*Example BinaryLayout format*
||Offset||Type||Description||
|0|long|TimeMillis|
|8|long|NanoTime|
|16|int|Level|
|20|int|Logger name index - string value in separate file|
|24|int|Thread name index - string value in separate file|
|28|long|Thread ID|
|36|int|Marker index - value & hierarchy in separate file|
|40|int|Message length|
|44|int|Message type: 0=text, 1+=custom message type|
|48|byte[]|Message data - below offset assumes 12 bytes of message data|
|60|int| Throwable data length|
|64|byte[]|Throwable data - below offset assumes 16 bytes of Throwable data|
|80|int|ThreadContext key/value pair count|
|84|int|ThreadContext key index - string value in separate file|
|88|int|ThreadContext value index - string value in separate file|

*Versioning*
The binary file must start with a header, indicating version information and 
perhaps schema information providing meta data on the log record. Schema 
information may make it possible to include/exclude fields. For version 1.0, 
the schema can either be fixed like the above example, or it could be a simple 
bitmask for the fields mentioned above.

*Custom Messages*
Note: custom Messages that implement the {{Encoder}} interface (introduced with 
LOG4J2-1274) can be written in binary form directly without first being 
converted to text (LOG4J2-506). Any specialized tool for reading binary log 
files should handle messages of type "text" out of the box, but could have some 
plugin mechanism for decoding custom messages.

*Byte Order*
TBD: Are multi-byte values like ints and longs written in big Endian or little 
Endian? This could be specified in the header, or we could fix it to either 
one. Exchange protocols like ITCH tend to select a fixed byte order (ITCH uses 
big Endian - network byte order). I like the simplicity of this approach.

*Multiple Files*
Repeating String data like thread names, logger names, marker names and 
ThreadContextMap keys and values are saved to a separate string-data file. The 
main log file contains an index (the line number, zero-based) into the 
string-data file instead of the full string. Index -1 means the String value 
was {{null}}. The format of the string-data file can simply be: each unique 
string on a separate line (separated by '\n' (0x0A) character). Any '\n' 
characters embedded in the string value are Unicode escaped and writen as 
"\u000A".

TBD: as Matt points out in the comment, Markers are special since they are 
hierarchic. One way to deal with this is to manage a separate file to save the 
Marker hierarchy. Another way is to do something similar to PatternLayout: 
treat it as a String value, where the string includes hierarchy information. I 
like the simplicity of the latter approach.

  was:
Logging in a binary format instead of in text can give large performance 
improvements. 

Logging text means going from a LogEvent object to formatted text, and then 
converting this text to bytes. Performance investigations with text-based 
logging formats like PatternLayout (see LOG4J2-930), and encoding Strings to 
bytes (LOG4J2-935, LOG4J2-1151) suggest that formatting and encoding text is 
expensive and imposes limits on the performance that can be achieved. 

A different approach would be to convert the LogEvent to a binary 
representation directly without creating a text representation first. This 
would result in extremely compact log files that are fast to write. The 
trade-off is that a binary log cannot easily be read in a general-purpose 
editor like VI or Notepad. A specialized tool would be necessary to either 
display or convert to human-readable form. 

This ticket proposes a simple BinaryLayout, where each LogEvent is logged in a 
binary format.

*Example BinaryLayout format*
||Offset||Type||Description||
|0|long|TimeMillis|
|8|long|NanoTime|
|16|int|Level|
|20|int|Logger name index - string value in separate file|
|24|int|Thread name index - string value in separate file|
|28|long|Thread ID|
|36|int|Marker index - value & hierarchy in separate file|
|40|int|Message length|
|44|int|Message type: 0=text, 1+=custom message type|
|48|byte[]|Message data - below offset assumes 12 bytes of message data|
|60|int| Throwable data length|
|64|byte[]|Throwable data - below offset assumes 16 bytes of Throwable data|
|80|int|ThreadContext key/value pair count|
|84|int|ThreadContext key index - string value in separate file|
|88|int|ThreadContext value index - string value in separate file|

*Versioning*
The binary file must start with a header, indicating version information and 
perhaps schema information providing meta data on the log record. Schema 
information may make it possible to include/exclude fields. For version 1.0, 
the schema can either be fixed like the above example, or it could be a simple 
bitmask for the fields mentioned above.

*Custom Messages*
Note: custom Messages that implement the {{Encoder}} interface (introduced with 
LOG4J2-1274) can be written in binary form directly without first being 
converted to text (LOG4J2-506). Any specialized tool for reading binary log 
files should handle messages of type "text" out of the box, but could have some 
plugin mechanism for decoding custom messages.

*Byte Order*
TBD: Are multi-byte values like ints and longs written in big Endian or little 
Endian? This could be specified in the header, or we could fix it to either 
one. Exchange protocols like ITCH tend to select a fixed byte order (ITCH uses 
big Endian - network byte order). I like the simplicity of this approach.

*Multiple Files*
Repeating String data like thread names, logger names, marker names and 
ThreadContextMap keys and values are saved to a separate string-data file. The 
main log file contains an index (the line number, zero-based) into the 
string-data file instead of the full string. The format of this file can simply 
be: each unique string on a separate line (separated by '\n' (0x0A) character). 
Any '\n' characters embedded in the string value are Unicode escaped and writen 
as "\u000A".

TBD: as Matt points out in the comment, Markers are special since they are 
hierarchic. One way to deal with this is to manage a separate file to save the 
Marker hierarchy. Another way is to do something similar to PatternLayout: 
treat it as a String value, where the string includes hierarchy information. I 
like the simplicity of the latter approach.


> Binary Layout
> -------------
>
>                 Key: LOG4J2-1305
>                 URL: https://issues.apache.org/jira/browse/LOG4J2-1305
>             Project: Log4j 2
>          Issue Type: New Feature
>          Components: Layouts
>            Reporter: Remko Popma
>              Labels: binary
>
> Logging in a binary format instead of in text can give large performance 
> improvements. 
> Logging text means going from a LogEvent object to formatted text, and then 
> converting this text to bytes. Performance investigations with text-based 
> logging formats like PatternLayout (see LOG4J2-930), and encoding Strings to 
> bytes (LOG4J2-935, LOG4J2-1151) suggest that formatting and encoding text is 
> expensive and imposes limits on the performance that can be achieved. 
> A different approach would be to convert the LogEvent to a binary 
> representation directly without creating a text representation first. This 
> would result in extremely compact log files that are fast to write. The 
> trade-off is that a binary log cannot easily be read in a general-purpose 
> editor like VI or Notepad. A specialized tool would be necessary to either 
> display or convert to human-readable form. 
> This ticket proposes a simple BinaryLayout, where each LogEvent is logged in 
> a binary format.
> *Example BinaryLayout format*
> ||Offset||Type||Description||
> |0|long|TimeMillis|
> |8|long|NanoTime|
> |16|int|Level|
> |20|int|Logger name index - string value in separate file|
> |24|int|Thread name index - string value in separate file|
> |28|long|Thread ID|
> |36|int|Marker index - value & hierarchy in separate file|
> |40|int|Message length|
> |44|int|Message type: 0=text, 1+=custom message type|
> |48|byte[]|Message data - below offset assumes 12 bytes of message data|
> |60|int| Throwable data length|
> |64|byte[]|Throwable data - below offset assumes 16 bytes of Throwable data|
> |80|int|ThreadContext key/value pair count|
> |84|int|ThreadContext key index - string value in separate file|
> |88|int|ThreadContext value index - string value in separate file|
> *Versioning*
> The binary file must start with a header, indicating version information and 
> perhaps schema information providing meta data on the log record. Schema 
> information may make it possible to include/exclude fields. For version 1.0, 
> the schema can either be fixed like the above example, or it could be a 
> simple bitmask for the fields mentioned above.
> *Custom Messages*
> Note: custom Messages that implement the {{Encoder}} interface (introduced 
> with LOG4J2-1274) can be written in binary form directly without first being 
> converted to text (LOG4J2-506). Any specialized tool for reading binary log 
> files should handle messages of type "text" out of the box, but could have 
> some plugin mechanism for decoding custom messages.
> *Byte Order*
> TBD: Are multi-byte values like ints and longs written in big Endian or 
> little Endian? This could be specified in the header, or we could fix it to 
> either one. Exchange protocols like ITCH tend to select a fixed byte order 
> (ITCH uses big Endian - network byte order). I like the simplicity of this 
> approach.
> *Multiple Files*
> Repeating String data like thread names, logger names, marker names and 
> ThreadContextMap keys and values are saved to a separate string-data file. 
> The main log file contains an index (the line number, zero-based) into the 
> string-data file instead of the full string. Index -1 means the String value 
> was {{null}}. The format of the string-data file can simply be: each unique 
> string on a separate line (separated by '\n' (0x0A) character). Any '\n' 
> characters embedded in the string value are Unicode escaped and writen as 
> "\u000A".
> TBD: as Matt points out in the comment, Markers are special since they are 
> hierarchic. One way to deal with this is to manage a separate file to save 
> the Marker hierarchy. Another way is to do something similar to 
> PatternLayout: treat it as a String value, where the string includes 
> hierarchy information. I like the simplicity of the latter approach.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to