[
https://issues.apache.org/jira/browse/IGNITE-11655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Anton Dmitriev updated IGNITE-11655:
------------------------------------
Description:
OneHotEncoder returns more columns than expected (two values that might be
encoded using two columns encoded using 3 columns). The following example
demonstrates the problem:
{code:java}
Map<Integer, Object[]> training = new HashMap<>();
training.put(0, new Object[]{42.0});
training.put(1, new Object[]{43.0});
training.put(2, new Object[]{42.0});
EncoderTrainer<Integer, Object[]> trainer = new EncoderTrainer<Integer,
Object[]>()
.withEncoderType(EncoderType.ONE_HOT_ENCODER)
.withEncodedFeature(0);
IgniteBiFunction<Integer, Object[], Vector> processor = trainer.fit(training,
1, (k, v) -> v);
Vector res = processor.apply(1, new Object[]{42.0});
System.out.println(Arrays.toString(res.asArray()));
>>> [0.0, 1.0, 0.0]
{code}
was:
OneHotEncoder returns more columns than expected (two values that might be
encoded using two columns encoded using 3 columns). The following example
demonstrates the problem:
Map<Integer, Object[]> training = new HashMap<>();
training.put(0, new Object[]{42.0});
training.put(1, new Object[]{43.0});
training.put(2, new Object[]{42.0});
EncoderTrainer<Integer, Object[]> trainer = new EncoderTrainer<Integer,
Object[]>()
.withEncoderType(EncoderType.ONE_HOT_ENCODER)
.withEncodedFeature(0);
IgniteBiFunction<Integer, Object[], Vector> processor =
trainer.fit(training, 1, (k, v) -> v);
Vector res = processor.apply(1, new Object[]{42.0});
System.out.println(Arrays.toString(res.asArray()));
>>> [0.0, 1.0, 0.0]
> ML: OneHotEncoder returns more columns than expected
> ----------------------------------------------------
>
> Key: IGNITE-11655
> URL: https://issues.apache.org/jira/browse/IGNITE-11655
> Project: Ignite
> Issue Type: Bug
> Components: ml
> Affects Versions: 2.7
> Reporter: Anton Dmitriev
> Priority: Major
>
> OneHotEncoder returns more columns than expected (two values that might be
> encoded using two columns encoded using 3 columns). The following example
> demonstrates the problem:
> {code:java}
> Map<Integer, Object[]> training = new HashMap<>();
> training.put(0, new Object[]{42.0});
> training.put(1, new Object[]{43.0});
> training.put(2, new Object[]{42.0});
> EncoderTrainer<Integer, Object[]> trainer = new EncoderTrainer<Integer,
> Object[]>()
> .withEncoderType(EncoderType.ONE_HOT_ENCODER)
> .withEncodedFeature(0);
> IgniteBiFunction<Integer, Object[], Vector> processor = trainer.fit(training,
> 1, (k, v) -> v);
> Vector res = processor.apply(1, new Object[]{42.0});
> System.out.println(Arrays.toString(res.asArray()));
> >>> [0.0, 1.0, 0.0]
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)