[
https://issues.apache.org/jira/browse/SYSTEMML-700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jeremy updated SYSTEMML-700:
----------------------------
Description:
The Logistic Regression algorithm requires that category labels be labeled as 0
up to the number of classes-1. It should be able to handle any set of category
labels provided by the user. B_out should have the appropriate size regardless
of the values of the labels given, and the algorithm should also preserve the
original labeling for the user.
Added detail:
The solution I'm currently using is to transform the labels from whatever
values they are to 0, 1, 2,... before hand, and then transform them back to
their original labels after the algorithm runs.
Currently the algorithm doesn't handle class values that don't start at 0 or 1,
and doesn't handle non-contiguous integers, both of which can come up. For
example, the result for class labels 4,5,6 will return 5 sets of coefficients
(correct number should be 2), and class labels -1, 0, 1 returns just one set of
coefficients (correct number should be 2).
Handling frames with strings would be a really great user experience - that
could look like R's coercion internally. Both glmnet and scikit-learn handle
string label arguments, but both apis are weakly typed as well.
was:The Logistic Regression algorithm requires that category labels be
labeled as 0 up to the number of classes-1. It should be able to handle any set
of category labels provided by the user. B_out should have the appropriate size
regardless of the values of the labels given, and the algorithm should also
preserve the original labeling for the user.
> Inflexible category labels for Multinomial Logistic Regression
> --------------------------------------------------------------
>
> Key: SYSTEMML-700
> URL: https://issues.apache.org/jira/browse/SYSTEMML-700
> Project: SystemML
> Issue Type: Bug
> Components: Algorithms
> Reporter: Jeremy
> Priority: Minor
> Original Estimate: 4h
> Remaining Estimate: 4h
>
> The Logistic Regression algorithm requires that category labels be labeled as
> 0 up to the number of classes-1. It should be able to handle any set of
> category labels provided by the user. B_out should have the appropriate size
> regardless of the values of the labels given, and the algorithm should also
> preserve the original labeling for the user.
> Added detail:
> The solution I'm currently using is to transform the labels from whatever
> values they are to 0, 1, 2,... before hand, and then transform them back to
> their original labels after the algorithm runs.
> Currently the algorithm doesn't handle class values that don't start at 0 or
> 1, and doesn't handle non-contiguous integers, both of which can come up. For
> example, the result for class labels 4,5,6 will return 5 sets of coefficients
> (correct number should be 2), and class labels -1, 0, 1 returns just one set
> of coefficients (correct number should be 2).
> Handling frames with strings would be a really great user experience - that
> could look like R's coercion internally. Both glmnet and scikit-learn handle
> string label arguments, but both apis are weakly typed as well.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)