Github user senorcarbone commented on the pull request: https://github.com/apache/incubator-samoa/pull/11#issuecomment-100484501 Hello again @gdfm and @abifet , I did a lot of cross-profiling between storm and flink, running the same `VerticalHoeffdingTree` task under different configurations during the last two days and I think the results are quite interesting. It looks like the algorithm performance (and accuracy) depends heavily on the ingestion speed of the local statistics processors. The paradox is that the greater the speed the slower the whole computation gets by time since more and more attribute events are sent to the local statistics processors with higher rate, the more updates the model aggregator gets back. The average processing delay (in num of flatten instances processed by the aggregator between sending a process event and receiving the respective local statistics) is ~2k instances for Flink and around 400k instances for Storm. Also in Storm the aggregator continuously broadcasts ~100-200 attribute messages to local processors on average while Flink broadcasts ~2100 attribute messages due to the rate it gets results back I assume. These are collected locally on each component and there was no message duplication. Since you worked on the algorithm, do you find this behavior reasonable?
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---