Github user gdfm commented on the pull request: https://github.com/apache/incubator-samoa/pull/11#issuecomment-101618723 Indeed, I see what you mean. Given that the feedback loop in Flink is faster, the number of attempts to split should increase. This is expected, but the number of such attempts is upper bounded by the ones tried on the Local engine, where there is no delay between request of the split criterion and response by the local statistics. We already have some flow control to regulate the rate of ingestion in PrequentialEvaluation. I'll play a bit with it to see what happens. When you put the 2 seconds delay in the Flink Processors, what happens (I guess) is that the whole data streams through a very rough, sub-optimal version of the tree. So it's very fast, but the precision drops considerably because of the artificial limit on the number of split attempts.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---