Hi, Vyacheslav, Thanks for the advice. Actually, we already have the MapReduce approach implementation in ML dataset and this implementation is based on compute task. So, I think that I just can to reuse this solution.
Best regards, Alexey Platonov вт, 10 сент. 2019 г., 14:27 Vyacheslav Daradur <[email protected]>: > Hi, Alexey, > > I agree that Map-Reduce on demand looks more promising solution. > We can use Compute tasks for implementation. > 'Map' phase can be tunned to process data by some trigger (dataset > update?) on ContiniousQuery manner and call 'Reduce' (with some > cache?) on demand. > > > On Tue, Sep 10, 2019 at 2:09 PM Алексей Платонов <[email protected]> > wrote: > > > > I mean metrics for model evaluation like Accuracy or Precision/Recall for > > ML models. It isn't same as system metrics (like throughput). Such > metrics > > should be computed over a test set after model training. if it is > > interesting for you, please, have a look at this material: > > https://en.wikipedia.org/wiki/Precision_and_recall . It's just homonymy > > between machine learning metrics and system metrics. We can't compute > > ML-metrics via Zabbix for example. > > > > Best regards, > > Alexey Platonov > > > > вт, 10 сент. 2019 г. в 13:52, Nikolay Izhikov <[email protected]>: > > > > > Hello, Alexey. > > > > > > Why do we need distributed metrics in the first place? > > > It seems, there are many metric processing system out there: > Prometheus, > > > Zabbix, Splunk, etc. > > > > > > Each of then can aggregate metrics in many ways. > > > > > > I think, we should not use Ignite as an metrics aggregation system. > > > > > > What do you think? > > > > > > В Вт, 10/09/2019 в 13:08 +0300, Алексей Платонов пишет: > > > > Hi Igniters! > > > > I've been working on a prototype of distributed metrics computation > for > > > > ML-models. Unfortunately, we don't have an ability to compute > metrics in > > > a > > > > distributed manner, so, it leads to gathering metric statistics to > client > > > > node via ScanQuery and all flow of vectors from partitions will be > sent > > > to > > > > a client. I want to avoid such behavior and I propose the framework > for > > > > metrics computation using MapReduce approach based on an aggregation > of > > > > statistics for metrics. > > > > > > > > I prepared an issue in Apache Jira for this: > > > > https://issues.apache.org/jira/browse/IGNITE-12155 > > > > Also, I prepared PR for it: > https://github.com/apache/ignite/pull/6857 > > > > Currently, the work on this framework is still running but I'm going > to > > > > prepare full PR during this week. > > > > > > > > By this email, I want to start a discussion about this idea. > > > > > > > > Best regards, > > > > Alexey Platonov > > > > > > > -- > Best Regards, Vyacheslav D. >
