[
https://issues.apache.org/jira/browse/FLINK-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15905191#comment-15905191
]
Till Rohrmann commented on FLINK-5936:
--------------------------------------
The {{PredictDataSetOperation}} is used in the {{KNN}} as well as {{ALS}}
implementation.
I think the id should not be added to the {{Vector}} class because this is a
pure math class. Instead we should provide a wrapper class for that. Then we
have to make sure that this wrapper is understood by all predict operations
such that it unwraps the vector information, then applies the algorithm and
then outputs the result wrapped again with the id.
> Can't pass keyed vectors to KNN join algorithm
> ------------------------------------------------
>
> Key: FLINK-5936
> URL: https://issues.apache.org/jira/browse/FLINK-5936
> Project: Flink
> Issue Type: Improvement
> Components: Machine Learning Library
> Affects Versions: 1.1.3
> Reporter: Alex DeCastro
> Priority: Minor
>
> Hi there,
> I noticed that for Scala 2.10/Flink 1.1.3 there's no way to recover keys from
> the predict method of KNN join even if the Vector (FlinkVector) class gets
> extended to allow for keys.
> If I create a class say, SparseVectorsWithKeys the predict method will return
> SparseVectors only. Any workarounds here?
> Would it be possible to either extend the Vector class or the ML models to
> consume and output keyed vectors? This is very important to NLP and pretty
> much a lot of ML pipeline debugging -- including logging.
> Thanks a lot
> Alex
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)