[
https://issues.apache.org/jira/browse/FLINK-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110397#comment-15110397
]
ASF GitHub Bot commented on FLINK-1994:
---------------------------------------
Github user tillrohrmann commented on a diff in the pull request:
https://github.com/apache/flink/pull/1397#discussion_r50383721
--- Diff: docs/libs/ml/optimization.md ---
@@ -256,6 +271,79 @@ The full list of supported prediction functions can be
found [here](#prediction-
</tbody>
</table>
+#### Effective Learning Rate ##
+
+Where:
+- $j$ is the iteration number
+- $\eta_j$ is the step size on step $j$
+- $\eta_0$ is the initial step size
+- $\lambda$ is the regularization constant
+- $k$ is the decay constant
+
+<table class="table table-bordered">
+ <thead>
+ <tr>
+ <th class="text-left" style="width: 20%">Function Name</th>
+ <th class="text-center">Description</th>
+ <th class="text-center">Function</th>
+ <th class="text-center">Called As</th>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td><strong>Default</strong></td>
+ <td>
+ <p>
+ The function default method used for determining the step
size. This is equivalent to the inverse scaling method for $\tau$=0.5. This
special case is kept as the default to maintain backwards compatibility.
+ </p>
+ </td>
+ <td class="text-center">$\eta_j = \eta_0/\sqrt{j}$</td>
+ <td class="text-center">`default`</td>
+ </tr>
+ <tr>
+ <td><strong>Constant</strong></td>
+ <td>
+ <p>
+ The step size is constant throughout the learning task.
+ </p>
+ </td>
+ <td class="text-center">$\eta_j = \eta_0$</td>
+ <td class="text-center">`constant`</td>
+ </tr>
+ <tr>
+ <td><strong>Leon Bottou's Method</strong></td>
+ <td>
+ <p>
+ This is the `'optimal'` method of sklearn. Chooses optimal
initial $t_0 = \lambda \cdot eta_0$, based on Leon Bottou's [Learning with
Large Data Sets ](http://leon.bottou.org/slides/largescale/lstut.pdf)
+ </p>
+ </td>
+ <td class="text-center">\eta_j = \frac{1}{\lambda \cdot
(\frac{1}{\lambda \cdot eta_0 } +j -1) }</td>
+ <td class="text-center">`bottou`</td>
+ </tr>
+ <tr>
+ <td><strong>Inverse Scaling</strong></td>
+ <td>
+ <p>
+ A very common method for determining the step size.
+ </p>
+ </td>
+ <td class="text-center">$\eta_j = \frac{\lambda}{j^{\tau}}$</td>
--- End diff --
Maybe don't use `frac` here but instead `\lambda/ j^\tau`, because the
exponent of `j` is rendered really small with `frac`.
> Add different gain calculation schemes to SGD
> ---------------------------------------------
>
> Key: FLINK-1994
> URL: https://issues.apache.org/jira/browse/FLINK-1994
> Project: Flink
> Issue Type: Improvement
> Components: Machine Learning Library
> Reporter: Till Rohrmann
> Assignee: Trevor Grant
> Priority: Minor
> Labels: ML, Starter
>
> The current SGD implementation uses as gain for the weight updates the
> formula {{stepsize/sqrt(iterationNumber)}}. It would be good to make the gain
> calculation configurable and to provide different strategies for that. For
> example:
> * stepsize/(1 + iterationNumber)
> * stepsize*(1 + regularization * stepsize * iterationNumber)^(-3/4)
> See also how to properly select the gains [1].
> Resources:
> [1] http://arxiv.org/pdf/1107.2490.pdf
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)