Nandish Jayaram created MADLIB-1245:
---------------------------------------
Summary: Randomize data after standardization
Key: MADLIB-1245
URL: https://issues.apache.org/jira/browse/MADLIB-1245
Project: Apache MADlib
Issue Type: Improvement
Components: Module: Utilities
Reporter: Nandish Jayaram
The functions `utils_ind_var_scales` and `utils_ind_var_scales_grouping` in
`convex.utils_regularization` are used to standardize the input data, which is
then fed to the underlying gradient descent solver. Most often, randomizing the
data works well with gradient descent.
The current functions create a temp table consisting of the standardized
version of the input data, but the rows are not randomly distributed. Can we
distribute it randomly? This might affect multiple modules, so all those
affected modules must be tested well to ensure this change is acceptable.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)