jerryshao opened a new issue, #11171:
URL: https://github.com/apache/gravitino/issues/11171

   ### Describe the feature
   
   Add a `JsonAuditFormatter` to the audit subsystem alongside the existing 
`SimpleFormatter` and `SimpleFormatterV2` TSV formatters.
   
   The new formatter must:
   - Output one JSON object per audit row.
   - Use ISO 8601 timestamps with millisecond precision and explicit timezone 
offset.
   - Serialize all fields including the `customInfo` map, which is currently 
silently dropped by both TSV formatters despite being part of the `AuditLog` 
interface.
   - Scrub sensitive HTTP headers (`Authorization`, `Cookie`, 
`X-Amz-Security-Token`) and credential-bearing properties (`s3.access-key-id`, 
`jdbc-password`) before serialization.
   
   ### Motivation
   
   The two existing formatters (`SimpleFormatter`, `SimpleFormatterV2`) both 
produce tab-separated text, which has three practical problems:
   
   1. **SIEM incompatibility.** Enterprise SIEMs (Splunk, Datadog, Sumo Logic, 
Elastic) parse JSON natively. TSV requires custom parsers that break on every 
schema change.
   
   2. **Industry alignment.** Structured JSON is the standard for compliance 
audit output (AWS CloudTrail, Kubernetes audit, Snowflake, Databricks Unity 
Catalog, Apache Polaris, OpenTelemetry). TSV is an outlier that creates 
friction during compliance evaluations.
   
   3. **Blocks structured data capture.** Response payloads from list 
operations and credential-vending events are structured data that cannot be 
represented in flat TSV. A JSON formatter is a prerequisite to capturing that 
data in audit rows.
   
   ### Describe the solution
   
   - Ship a `JsonAuditFormatter` class implementing the existing `Formatter` 
interface.
   - Make the JSON formatter the new default; operators requiring legacy TSV 
output can opt back in via configuration.
   - Retain `SimpleFormatter` and `SimpleFormatterV2` as-is for backward 
compatibility.
   
   ### Additional context
   
   The `customInfo` map is already present on the `AuditLog` interface but is 
not serialized by either current formatter. The JSON formatter should be the 
first to expose it as a structured field.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to