[
https://issues.apache.org/jira/browse/SPARK-47240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dongjoon Hyun updated SPARK-47240:
----------------------------------
Labels: pull-request-available releasenotes (was: pull-request-available)
> SPIP: Structured Logging Framework for Apache Spark
> ---------------------------------------------------
>
> Key: SPARK-47240
> URL: https://issues.apache.org/jira/browse/SPARK-47240
> Project: Spark
> Issue Type: New Feature
> Components: Spark Core
> Affects Versions: 4.0.0
> Reporter: Gengliang Wang
> Assignee: Apache Spark
> Priority: Major
> Labels: pull-request-available, releasenotes
>
> This proposal aims to enhance Apache Spark's logging system by implementing
> structured logging. This transition will change the format of the default log
> files from plain text to JSON, making them more accessible and analyzable.
> The new logs will include crucial identifiers such as worker, executor,
> query, job, stage, and task IDs, thereby making the logs more informative and
> facilitating easier search and analysis.
> h2. Current Logging Format
> The current format of Spark logs is plain text, which can be challenging to
> parse and analyze efficiently. An example of the current log format is as
> follows:
> {code:java}
> 23/11/29 17:53:44 ERROR BlockManagerMasterEndpoint: Fail to know the executor
> 289 is alive or not.
> org.apache.spark.SparkException: Exception thrown in awaitResult:
> <stacktrace…>
> Caused by: org.apache.spark.rpc.RpcEndpointNotFoundException: ..
> {code}
> h2. Proposed Structured Logging Format
> The proposed change involves structuring the logs in JSON format, which
> organizes the log information into easily identifiable fields. Here is how
> the new structured log format would look:
> {code:java}
> {
> "ts":"23/11/29 17:53:44",
> "level":"ERROR",
> "msg":"Fail to know the executor 289 is alive or not",
> "context":{
> "executor_id":"289"
> },
> "exception":{
> "class":"org.apache.spark.SparkException",
> "msg":"Exception thrown in awaitResult",
> "stackTrace":"..."
> },
> "source":"BlockManagerMasterEndpoint"
> } {code}
> This format will enable users to upload and directly query
> driver/executor/master/worker log files using Spark SQL for more effective
> problem-solving and analysis, such as tracking executor losses or identifying
> faulty tasks:
> {code:java}
> spark.read.json("hdfs://hdfs_host/logs").createOrReplaceTempView("logs")
> /* To get all the executor lost logs */
> SELECT * FROM logs WHERE contains(message, 'Lost executor');
> /* To get all the distributed logs about executor 289 */
> SELECT * FROM logs WHERE executor_id = 289;
> /* To get all the errors on host 100.116.29.4 */
> SELECT * FROM logs WHERE host = "100.116.29.4" and log_level="ERROR";
> {code}
>
> SPIP doc:
> [https://docs.google.com/document/d/1rATVGmFLNVLmtxSpWrEceYm7d-ocgu8ofhryVs4g3XU/edit?usp=sharing]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]