[
https://issues.apache.org/jira/browse/FLINK-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15897059#comment-15897059
]
Till Rohrmann commented on FLINK-5936:
--------------------------------------
Hi Alex,
now I understand. This is indeed currently not supported. I guess that this
kind of requirement does not only show up for KNN but any {{Estimator}} in
general. Thus, I guess that the solution should generalize to all algorithms
not only KNN.
The {{PredictDataSetOperation}} basically defines how you do predictions for a
given model and input data. Furthermore, the operation defines the output of
the prediction. So for {{KNN}}, it takes an instance of {{KNN}} (the model), a
{{DataSet}} of {{FlinkVectors}} (also any subtype) and produces a {{DataSet}}
of {{(FlinkVector, Array[FlinkVector])}} with the first tuple value being the
query vector and the array of vectors being the closest k neighbours.
If you want to support a different input and output type you would have to
implement a respective {{PredictDataSetOperation}} for these types.
> Can't pass keyed vectors to KNN join algorithm
> ------------------------------------------------
>
> Key: FLINK-5936
> URL: https://issues.apache.org/jira/browse/FLINK-5936
> Project: Flink
> Issue Type: Improvement
> Components: Machine Learning Library
> Affects Versions: 1.1.3
> Reporter: Alex DeCastro
> Priority: Minor
>
> Hi there,
> I noticed that for Scala 2.10/Flink 1.1.3 there's no way to recover keys from
> the predict method of KNN join even if the Vector (FlinkVector) class gets
> extended to allow for keys.
> If I create a class say, SparseVectorsWithKeys the predict method will return
> SparseVectors only. Any workarounds here?
> Would it be possible to either extend the Vector class or the ML models to
> consume and output keyed vectors? This is very important to NLP and pretty
> much a lot of ML pipeline debugging -- including logging.
> Thanks a lot
> Alex
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)