pan3793 opened a new issue, #6832:
URL: https://github.com/apache/kyuubi/issues/6832

   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   
   
   ### Search before asking
   
   - [X] I have searched in the 
[issues](https://github.com/apache/kyuubi/issues?q=is%3Aissue) and found no 
similar issues.
   
   
   ### Describe the feature
   
   Leverage the Spark DSv2 API to implement a connector that provides a SQL 
interface to access the YARN agg logs, and maybe other YARN resources in the 
future.
   
   ### Motivation
   
   For large-scale Spark on YARN deployments, there are dozens or even hundreds 
of thousands of Spark applications submitted to the cluster per day, and the 
app logs are collected and aggregated by YARN stored on HDFS, sometimes we 
might want to analyze the logs to identify some cluster-level issues, for 
example, some machine might have hardware issues that frequently produce 
disk/network exceptions, it's straightforward to leverage Spark to analyze 
those logs in parallel.
   
   ### Describe the solution
   
   the usage might be like
   
   ```
   $ spark-sql --conf 
spark.sql.catalog.yarn=org.apache.kyuubi.spark.connector.yarn.YarnCatalog
   > SELECT
       app_id, app_attempt_id,
       app_start_time, app_end_time,
       container_id, host,
       file_name, line_num, message
     FROM yarn.agg_logs
     WHERE app_id = 'application_1234'
       AND container_id='container_12345'
       AND host = 'hadoop123.example.com'
   ```
   
   ### Additional context
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes. I would be willing to submit a PR with guidance from the Kyuubi 
community to improve.
   - [ ] No. I cannot submit a PR at this time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@kyuubi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscr...@kyuubi.apache.org
For additional commands, e-mail: notifications-h...@kyuubi.apache.org

Reply via email to