[
https://issues.apache.org/jira/browse/SYSTEMML-700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15536545#comment-15536545
]
Niketan Pansare commented on SYSTEMML-700:
------------------------------------------
Pros of existing approach (i.e. label transformation in PredictionUtils):
1. Addresses this JIRA and also allows string-based labels.
2. To attract scikit-learn/python users, this feature is a must have.
Cons of existing approach:
1. Performance impact as it requires an additional preprocessing pass of doing
label transformation.
2. Consistency with label conversion. As an example: if inputs fails or
produces incorrect results from commandline, it should have same behavior
through API.
[~mboehm7] [~freiss] [[email protected]] Since it is important to attract
more users as well as to reduce performance overhead, how about going with
following solution ? We add additional parameter to the wrappers (i.e.
encodeData) and we can have it turned on by default in Python.
> Inflexible category labels for Multinomial Logistic Regression
> --------------------------------------------------------------
>
> Key: SYSTEMML-700
> URL: https://issues.apache.org/jira/browse/SYSTEMML-700
> Project: SystemML
> Issue Type: Bug
> Components: Algorithms
> Reporter: Jeremy
> Priority: Minor
> Original Estimate: 4h
> Remaining Estimate: 4h
>
> The Logistic Regression algorithm requires that category labels be labeled as
> 0 up to the number of classes-1. It should be able to handle any set of
> category labels provided by the user. B_out should have the appropriate size
> regardless of the values of the labels given, and the algorithm should also
> preserve the original labeling for the user.
> Added detail:
> The solution I'm currently using is to transform the labels from whatever
> values they are to 0, 1, 2,... before hand, and then transform them back to
> their original labels after the algorithm runs.
> Currently the algorithm doesn't handle class values that don't start at 0 or
> 1, and doesn't handle non-contiguous integers, both of which can come up. For
> example, the result for class labels 4,5,6 will return 5 sets of coefficients
> (correct number should be 2), and class labels -1, 0, 1 returns just one set
> of coefficients (correct number should be 2).
> Handling frames with strings would be a really great user experience - that
> could look like R's coercion internally. Both glmnet and scikit-learn handle
> string label arguments, but both apis are weakly typed as well.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)