[ 
https://issues.apache.org/jira/browse/FLINK-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15897059#comment-15897059
 ] 

Till Rohrmann commented on FLINK-5936:
--------------------------------------

Hi Alex,

now I understand. This is indeed currently not supported. I guess that this 
kind of requirement does not only show up for KNN but any {{Estimator}} in 
general. Thus, I guess that the solution should generalize to all algorithms 
not only KNN.

The {{PredictDataSetOperation}} basically defines how you do predictions for a 
given model and input data. Furthermore, the operation defines the output of 
the prediction. So for {{KNN}}, it takes an instance of {{KNN}} (the model), a 
{{DataSet}} of {{FlinkVectors}} (also any subtype) and produces a {{DataSet}} 
of {{(FlinkVector, Array[FlinkVector])}} with the first tuple value being the 
query vector and the array of vectors being the closest k neighbours.

If you want to support a different input and output type you would have to 
implement a respective {{PredictDataSetOperation}} for these types.

> Can't pass keyed vectors to KNN join algorithm  
> ------------------------------------------------
>
>                 Key: FLINK-5936
>                 URL: https://issues.apache.org/jira/browse/FLINK-5936
>             Project: Flink
>          Issue Type: Improvement
>          Components: Machine Learning Library
>    Affects Versions: 1.1.3
>            Reporter: Alex DeCastro
>            Priority: Minor
>
> Hi there, 
> I noticed that for Scala 2.10/Flink 1.1.3 there's no way to recover keys from 
> the predict method of KNN join even if the Vector (FlinkVector) class gets 
> extended to allow for keys.  
> If I create a class say, SparseVectorsWithKeys the predict method will return 
> SparseVectors only. Any workarounds here?  
> Would it be possible to either extend the Vector class or the ML models to 
> consume and output keyed vectors?  This is very important to NLP and pretty 
> much a lot of ML pipeline debugging -- including logging. 
> Thanks a lot
> Alex



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to