[ 
https://issues.apache.org/jira/browse/SPARK-26399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ron Hu updated SPARK-26399:
---------------------------
    Description: 
Add the peak values for the metrics to the stages REST API. Also add a new 
executorSummary REST API, which will return executor summary metrics for a 
specified stage:
{code:java}
curl http://<spark history 
server>:18080/api/v1/applications/<application_id>/<application_attempt/stages/<stage_id>/<stage_attempt>/executorMetricsSummary{code}
Add parameters to the stages REST API to specify:
 * filtering for task status, and returning tasks that match (for example, 
FAILED tasks).
 * task metric quantiles, add adding the task summary if specified
 * executor metric quantiles, and adding the executor summary if specified

Note that the above description is too brief to be clear.  [~angerszhuuu] and 
[~ron8hu] discussed a generic and consistent way for endpoint 
/application/\{app-id}/stages.  It can be:

/application/\{app-id}/stages?details=[true|false]&status=[ACTIVE|COMPLETE|FAILED|PENDING|SKIPPED]&withSummaries=[true|false]&taskStatus=[RUNNING|SUCCESS|FAILED|PENDING]

where query parameter details=true is to show the detailed task information 
within each stage.  The default value is details=false;

query parameter status can select those stages with the specified status;

query parameter withSummaries=true is to show both task summary information in 
percentile distribution and executor summary information in percentile 
distribution.  The default value is withSummaries=false.

taskStatus is to show only those tasks with the specified status within their 
corresponding stages.

  was:
Add the peak values for the metrics to the stages REST API. Also add a new 
executorSummary REST API, which will return executor summary metrics for a 
specified stage:
{code:java}
curl http://<spark history 
server>:18080/api/v1/applications/<application_id>/<application_attempt/stages/<stage_id>/<stage_attempt>/executorMetricsSummary{code}
Add parameters to the stages REST API to specify:
 * filtering for task status, and returning tasks that match (for example, 
FAILED tasks).
 * task metric quantiles, add adding the task summary if specified
 * executor metric quantiles, and adding the executor summary if specified

Note that the above description is too brief to be clear.  Ron Hu added the 
additional details to explain the use cases from the downstream products.  See 
the comments dated 1/07/2021 with a couple of sample json files.


> Add new stage-level REST APIs and parameters
> --------------------------------------------
>
>                 Key: SPARK-26399
>                 URL: https://issues.apache.org/jira/browse/SPARK-26399
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Spark Core
>    Affects Versions: 3.1.0
>            Reporter: Edward Lu
>            Priority: Major
>         Attachments: executorMetricsSummary.json, 
> lispark230_restapi_ex2_stages_failedTasks.json, 
> lispark230_restapi_ex2_stages_withSummaries.json, 
> stage_executorSummary_image1.png
>
>
> Add the peak values for the metrics to the stages REST API. Also add a new 
> executorSummary REST API, which will return executor summary metrics for a 
> specified stage:
> {code:java}
> curl http://<spark history 
> server>:18080/api/v1/applications/<application_id>/<application_attempt/stages/<stage_id>/<stage_attempt>/executorMetricsSummary{code}
> Add parameters to the stages REST API to specify:
>  * filtering for task status, and returning tasks that match (for example, 
> FAILED tasks).
>  * task metric quantiles, add adding the task summary if specified
>  * executor metric quantiles, and adding the executor summary if specified
> Note that the above description is too brief to be clear.  [~angerszhuuu] and 
> [~ron8hu] discussed a generic and consistent way for endpoint 
> /application/\{app-id}/stages.  It can be:
> /application/\{app-id}/stages?details=[true|false]&status=[ACTIVE|COMPLETE|FAILED|PENDING|SKIPPED]&withSummaries=[true|false]&taskStatus=[RUNNING|SUCCESS|FAILED|PENDING]
> where query parameter details=true is to show the detailed task information 
> within each stage.  The default value is details=false;
> query parameter status can select those stages with the specified status;
> query parameter withSummaries=true is to show both task summary information 
> in percentile distribution and executor summary information in percentile 
> distribution.  The default value is withSummaries=false.
> taskStatus is to show only those tasks with the specified status within their 
> corresponding stages.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to