Github user gdfm commented on the pull request:

    https://github.com/apache/incubator-samoa/pull/11#issuecomment-101618723
  
    Indeed, I see what you mean. Given that the feedback loop in Flink is 
faster, the number of attempts to split should increase.
    This is expected, but the number of such attempts is upper bounded by the 
ones tried on the Local engine, where there is no delay between request of the 
split criterion and response by the local statistics.
    
    We already have some flow control to regulate the rate of ingestion in 
PrequentialEvaluation. I'll play a bit with it to see what happens.
    When you put the 2 seconds delay in the Flink Processors, what happens (I 
guess) is that the whole data streams through a very rough, sub-optimal version 
of the tree. So it's very fast, but the precision drops considerably because of 
the artificial limit on the number of split attempts.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to