lindong28 commented on pull request #24:
URL: https://github.com/apache/flink-ml/pull/24#issuecomment-959040775


   Thanks for the PR!
   
   I have the following high level comments after browsing through this PR:
   
   1) Can this PR be updated to use the Param interface from FLIP-174? Its code 
can be found at https://github.com/apache/flink-ml/pull/10.
   
   2) It appears that KnnTrainBatchOp implements the transform() API but not 
the fit() API. Should we have all training algorithm implement the Estimator 
interface (according to FLIP-173)?
   
   3) The PR introduces a few abstract classes (e.g. StreamOperator, 
BatchOperator, AbstractAlgoOperator) that are inherited by the concrete 
Estimator/Model subclasses. It is not clear to me whether those abstract 
classes are necessary as part of the Flink ML infra. Any chance we can simplify 
the PR by not having those abstract classes?
   
   4) We probably want to only add classes/methods that are necessary. For 
example, do we need TableSourceBatchOp? There are a few interfaces in 
Functional.java that are not used in the PR. And the SerializableFunction in 
Functional.java seems to duplicate the existing SerializableFunction in the 
core Flink repo. It is probably good to remove those classes/methods that are 
not necessary in this PR.
   
   5) The PR added some infra classes such as Mapper/FastDistance. All infra 
classes effectively become the public API as we expect most algorithms be 
implemented based on those infra classes. We can setup a meeting to go over 
those classes and see what is the best way to address the target use-case.
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to