Github user senorcarbone commented on the pull request:

    https://github.com/apache/incubator-samoa/pull/11#issuecomment-100484501
  
    Hello again @gdfm and @abifet ,
    I did a lot of cross-profiling between storm and flink, running the same 
`VerticalHoeffdingTree` task under different configurations during the last two 
days and I think the results are quite interesting. 
    
    It looks like the algorithm performance (and accuracy) depends heavily on 
the ingestion speed of the local statistics processors. The paradox is that the 
greater the speed the slower the whole computation gets by time  since more and 
more attribute events are sent to the local statistics processors with higher 
rate, the more updates the model aggregator gets back. 
    
    The average processing delay (in num of flatten instances processed by the 
aggregator between sending a process event and receiving the respective local 
statistics) is ~2k instances for Flink and around 400k instances for Storm. 
Also in Storm the aggregator continuously broadcasts ~100-200 attribute 
messages to local processors on average while Flink broadcasts ~2100 attribute 
messages due to the rate it gets results back I assume. These are collected 
locally on each component and there was no message duplication. 
    Since you worked on the algorithm, do you find this behavior reasonable?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to