Github user tillrohrmann commented on the pull request:

    https://github.com/apache/flink/pull/1565#issuecomment-179172028
  
    Really good work @f-sander. Good test coverage and good code documentation. 
    
    It would be good to add some online documentation for this algorithm (see 
flink/docs/libraries/ml).
    
    I had a comment concerning scalability. I fear that with the current 
implementation, the algorithm is effectively bound by the capacities of a 
single machine. Especially sorting the data on the heap is destined to quickly 
crash the system. I'm not an expert on isotonic regression but it would be nice 
to get rid of the operator which collects all the input data in a single task 
to sort them. 
    
    I also haven't gone through the math details yet. Will do, once the 
scalability issue is fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to