[ 
https://issues.apache.org/jira/browse/SYSTEMML-700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy updated SYSTEMML-700:
----------------------------
    Description: 
The Logistic Regression algorithm requires that category labels be labeled as 0 
up to the number of classes-1. It should be able to handle any set of category 
labels provided by the user. B_out should have the appropriate size regardless 
of the values of the labels given, and the algorithm should also preserve the 
original labeling for the user.

Added detail:

The solution I'm currently using is to transform the labels from whatever 
values they are to 0, 1, 2,... before hand, and then transform them back to 
their original labels after the algorithm runs.

Currently the algorithm doesn't handle class values that don't start at 0 or 1, 
and doesn't handle non-contiguous integers, both of which can come up. For 
example, the result for class labels 4,5,6 will return 5 sets of coefficients 
(correct number should be 2), and class labels -1, 0, 1 returns just one set of 
coefficients (correct number should be 2).

Handling frames with strings would be a really great user experience - that 
could look like R's coercion internally. Both glmnet and scikit-learn handle 
string label arguments, but both apis are weakly typed as well.

  was:The Logistic Regression algorithm requires that category labels be 
labeled as 0 up to the number of classes-1. It should be able to handle any set 
of category labels provided by the user. B_out should have the appropriate size 
regardless of the values of the labels given, and the algorithm should also 
preserve the original labeling for the user.


> Inflexible category labels for Multinomial Logistic Regression
> --------------------------------------------------------------
>
>                 Key: SYSTEMML-700
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-700
>             Project: SystemML
>          Issue Type: Bug
>          Components: Algorithms
>            Reporter: Jeremy
>            Priority: Minor
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> The Logistic Regression algorithm requires that category labels be labeled as 
> 0 up to the number of classes-1. It should be able to handle any set of 
> category labels provided by the user. B_out should have the appropriate size 
> regardless of the values of the labels given, and the algorithm should also 
> preserve the original labeling for the user.
> Added detail:
> The solution I'm currently using is to transform the labels from whatever 
> values they are to 0, 1, 2,... before hand, and then transform them back to 
> their original labels after the algorithm runs.
> Currently the algorithm doesn't handle class values that don't start at 0 or 
> 1, and doesn't handle non-contiguous integers, both of which can come up. For 
> example, the result for class labels 4,5,6 will return 5 sets of coefficients 
> (correct number should be 2), and class labels -1, 0, 1 returns just one set 
> of coefficients (correct number should be 2).
> Handling frames with strings would be a really great user experience - that 
> could look like R's coercion internally. Both glmnet and scikit-learn handle 
> string label arguments, but both apis are weakly typed as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to