lindong28 commented on pull request #24: URL: https://github.com/apache/flink-ml/pull/24#issuecomment-959040775
Thanks for the PR! I have the following high level comments after browsing through this PR: 1) Can this PR be updated to use the Param interface from FLIP-174? Its code can be found at https://github.com/apache/flink-ml/pull/10. 2) It appears that KnnTrainBatchOp implements the transform() API but not the fit() API. Should we have all training algorithm implement the Estimator interface (according to FLIP-173)? 3) The PR introduces a few abstract classes (e.g. StreamOperator, BatchOperator, AbstractAlgoOperator) that are inherited by the concrete Estimator/Model subclasses. It is not clear to me whether those abstract classes are necessary as part of the Flink ML infra. Any chance we can simplify the PR by not having those abstract classes? 4) We probably want to only add classes/methods that are necessary. For example, do we need TableSourceBatchOp? There are a few interfaces in Functional.java that are not used in the PR. And the SerializableFunction in Functional.java seems to duplicate the existing SerializableFunction in the core Flink repo. It is probably good to remove those classes/methods that are not necessary in this PR. 5) The PR added some infra classes such as Mapper/FastDistance. All infra classes effectively become the public API as we expect most algorithms be implemented based on those infra classes. We can setup a meeting to go over those classes and see what is the best way to address the target use-case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
