Dear Alexey, thank you for your PR, as an author of non-distributed metrics should say, that it was fast solution to keep parity with Spark ML. I have no time to implement it via our internal MR approach and your pR is really helpful.
Dear Nikolay, there is another kind of metrics (not that was mentioned by yourself), but like a metrics to evaluate Machine Learning Algorithms for example like accuracy (how many times the machine predicted correctly) and so on Great PR, I will have a looooook tomorrow вт, 10 сент. 2019 г. в 14:44, Алексей Платонов <[email protected]>: > Hi, Vyacheslav, > Thanks for the advice. Actually, we already have the MapReduce approach > implementation in ML dataset and this implementation is based on compute > task. So, I think that I just can to reuse this solution. > > Best regards, > Alexey Platonov > > вт, 10 сент. 2019 г., 14:27 Vyacheslav Daradur <[email protected]>: > > > Hi, Alexey, > > > > I agree that Map-Reduce on demand looks more promising solution. > > We can use Compute tasks for implementation. > > 'Map' phase can be tunned to process data by some trigger (dataset > > update?) on ContiniousQuery manner and call 'Reduce' (with some > > cache?) on demand. > > > > > > On Tue, Sep 10, 2019 at 2:09 PM Алексей Платонов <[email protected]> > > wrote: > > > > > > I mean metrics for model evaluation like Accuracy or Precision/Recall > for > > > ML models. It isn't same as system metrics (like throughput). Such > > metrics > > > should be computed over a test set after model training. if it is > > > interesting for you, please, have a look at this material: > > > https://en.wikipedia.org/wiki/Precision_and_recall . It's just > homonymy > > > between machine learning metrics and system metrics. We can't compute > > > ML-metrics via Zabbix for example. > > > > > > Best regards, > > > Alexey Platonov > > > > > > вт, 10 сент. 2019 г. в 13:52, Nikolay Izhikov <[email protected]>: > > > > > > > Hello, Alexey. > > > > > > > > Why do we need distributed metrics in the first place? > > > > It seems, there are many metric processing system out there: > > Prometheus, > > > > Zabbix, Splunk, etc. > > > > > > > > Each of then can aggregate metrics in many ways. > > > > > > > > I think, we should not use Ignite as an metrics aggregation system. > > > > > > > > What do you think? > > > > > > > > В Вт, 10/09/2019 в 13:08 +0300, Алексей Платонов пишет: > > > > > Hi Igniters! > > > > > I've been working on a prototype of distributed metrics computation > > for > > > > > ML-models. Unfortunately, we don't have an ability to compute > > metrics in > > > > a > > > > > distributed manner, so, it leads to gathering metric statistics to > > client > > > > > node via ScanQuery and all flow of vectors from partitions will be > > sent > > > > to > > > > > a client. I want to avoid such behavior and I propose the framework > > for > > > > > metrics computation using MapReduce approach based on an > aggregation > > of > > > > > statistics for metrics. > > > > > > > > > > I prepared an issue in Apache Jira for this: > > > > > https://issues.apache.org/jira/browse/IGNITE-12155 > > > > > Also, I prepared PR for it: > > https://github.com/apache/ignite/pull/6857 > > > > > Currently, the work on this framework is still running but I'm > going > > to > > > > > prepare full PR during this week. > > > > > > > > > > By this email, I want to start a discussion about this idea. > > > > > > > > > > Best regards, > > > > > Alexey Platonov > > > > > > > > > > > > -- > > Best Regards, Vyacheslav D. > > >
