[jira] [Updated] (SPARK-14567) Add instrumentation logs to MLlib training algorithms

Timothy Hunter (JIRA) Tue, 12 Apr 2016 11:38:50 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-14567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Timothy Hunter updated SPARK-14567:
-----------------------------------
    Description: 
In order to debug performance issues when training mllib algorithms,
it is useful to log some metrics about the training dataset, the training 
parameters, etc.

This ticket is an umbrella to add some simple logging messages to the most 
common MLlib estimators. There should be no performance impact on the current 
implementation, and the output is simply printed in the logs.

Here are some values that are of interest when debugging training tasks:
* number of features
* number of instances
* number of partitions
* number of classes
* input RDD/DF cache level
* hyper-parameters

  was:
In order to debug performance issues when training mllib algorithms,
it is useful to log some metrics about the training dataset, the training 
parameters, etc.

This ticket is an umbrella to add some simple logging messages to the most 
common MLlib estimators. There should be no performance impact on the current 
implementation, and the output is simply printed in the logs.

Here are some values that are of interest when debugging training tasks:
* number of features
* number of instances
* number of partitions
* number of classes
* input RDD/DF cache level
* hyper-parameters

I suggest to start with the most common al


> Add instrumentation logs to MLlib training algorithms
> -----------------------------------------------------
>
>                 Key: SPARK-14567
>                 URL: https://issues.apache.org/jira/browse/SPARK-14567
>             Project: Spark
>          Issue Type: Umbrella
>          Components: MLlib
>            Reporter: Timothy Hunter
>
> In order to debug performance issues when training mllib algorithms,
> it is useful to log some metrics about the training dataset, the training 
> parameters, etc.
> This ticket is an umbrella to add some simple logging messages to the most 
> common MLlib estimators. There should be no performance impact on the current 
> implementation, and the output is simply printed in the logs.
> Here are some values that are of interest when debugging training tasks:
> * number of features
> * number of instances
> * number of partitions
> * number of classes
> * input RDD/DF cache level
> * hyper-parameters



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-14567) Add instrumentation logs to MLlib training algorithms

Reply via email to