Hi Mich, Thanks for your suggestions. I agree that we should avoid confusion with Spark Structured Streaming.
So, I'll go with "Structured Logging Framework for Apache Spark". This keeps the standard term "Structured Logging" and distinguishes it from "Structured Streaming" clearly. Thanks for helping shape this! Best, Gengliang On Sat, Mar 2, 2024 at 12:19 PM Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Hi Gengliang, > > Thanks for taking the initiative to improve the Spark logging system. > Transitioning to structured logs seems like a worthy way to enhance the > ability to analyze and troubleshoot Spark jobs and hopefully the future > integration with cloud logging systems. While "Structured Spark Logging" > sounds good, I was wondering if we could consider an alternative name. > Since we already use "Spark Structured Streaming", there might be a slight > initial confusion with the terminology. I must confess it was my initial > reaction so to speak. > > Here are a few alternative names I came up with if I may > > - Spark Log Schema Initiative > - Centralized Logging with Structured Data for Spark > - Enhanced Spark Logging with Queryable Format > > These options all highlight the key aspects of your proposal namely; > schema, centralized logging and queryability and might be even clearer for > everyone at first glance. > > Cheers > > Mich Talebzadeh, > Dad | Technologist | Solutions Architect | Engineer > London > United Kingdom > > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* The information provided is correct to the best of my > knowledge but of course cannot be guaranteed . It is essential to note > that, as with any advice, quote "one test result is worth one-thousand > expert opinions (Werner <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von > Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". > > > On Fri, 1 Mar 2024 at 10:07, Gengliang Wang <ltn...@gmail.com> wrote: > >> Hi All, >> >> I propose to enhance our logging system by transitioning to structured >> logs. This initiative is designed to tackle the challenges of analyzing >> distributed logs from drivers, workers, and executors by allowing them to >> be queried using a fixed schema. The goal is to improve the informativeness >> and accessibility of logs, making it significantly easier to diagnose >> issues. >> >> Key benefits include: >> >> - Clarity and queryability of distributed log files. >> - Continued support for log4j, allowing users to switch back to >> traditional text logging if preferred. >> >> The improvement will simplify debugging and enhance productivity without >> disrupting existing logging practices. The implementation is estimated to >> take around 3 months. >> >> *SPIP*: >> https://docs.google.com/document/d/1rATVGmFLNVLmtxSpWrEceYm7d-ocgu8ofhryVs4g3XU/edit?usp=sharing >> *JIRA*: SPARK-47240 <https://issues.apache.org/jira/browse/SPARK-47240> >> >> Your comments and feedback would be greatly appreciated. >> >