Github user tillrohrmann commented on the pull request:
https://github.com/apache/flink/pull/1350#issuecomment-158966378
Thanks for your contribution @HilmiYildirim. I had some comments.
I'm not so sure about the correctness of the implementation. Where did you
get the testing data from @HilmiYildirim. Is that from some published source?
The current implementation lacks the integration with the FlinkML's
pipelining mechanism. Furthermore, it only works on `integers`. What if the
observations are characters for example.
The PR needs more documentation for better understanding the code and
maintaining it.
Could we maybe add a smaller test file. One of the files is 2MB large. This
should definitely be smaller.
I think this PR needs a little bit of more effort to get in shape.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---