Re: [ML] Distributed metrics computation

Алексей Платонов Tue, 10 Sep 2019 04:45:01 -0700

Hi, Vyacheslav,
Thanks for the advice. Actually, we already have the MapReduce approach
implementation in ML dataset and this implementation is based on compute
task. So, I think that I just can to reuse this solution.


Best regards,
Alexey Platonov

вт, 10 сент. 2019 г., 14:27 Vyacheslav Daradur <[email protected]>:

> Hi, Alexey,
>
> I agree that Map-Reduce on demand looks more promising solution.
> We can use Compute tasks for implementation.
> 'Map' phase can be tunned to process data by some trigger (dataset
> update?) on ContiniousQuery manner and call 'Reduce' (with some
> cache?) on demand.
>
>
> On Tue, Sep 10, 2019 at 2:09 PM Алексей Платонов <[email protected]>
> wrote:
> >
> > I mean metrics for model evaluation like Accuracy or Precision/Recall for
> > ML models. It isn't same as system metrics (like throughput). Such
> metrics
> > should be computed over a test set after model training. if it is
> > interesting for you, please, have a look at this material:
> > https://en.wikipedia.org/wiki/Precision_and_recall . It's just homonymy
> > between machine learning metrics and system metrics. We can't compute
> > ML-metrics via Zabbix for example.
> >
> > Best regards,
> > Alexey Platonov
> >
> > вт, 10 сент. 2019 г. в 13:52, Nikolay Izhikov <[email protected]>:
> >
> > > Hello, Alexey.
> > >
> > > Why do we need distributed metrics in the first place?
> > > It seems, there are many metric processing system out there:
> Prometheus,
> > > Zabbix, Splunk, etc.
> > >
> > > Each of then can aggregate metrics in many ways.
> > >
> > > I think, we should not use Ignite as an metrics aggregation system.
> > >
> > > What do you think?
> > >
> > > В Вт, 10/09/2019 в 13:08 +0300, Алексей Платонов пишет:
> > > > Hi Igniters!
> > > > I've been working on a prototype of distributed metrics computation
> for
> > > > ML-models. Unfortunately, we don't have an ability to compute
> metrics in
> > > a
> > > > distributed manner, so, it leads to gathering metric statistics to
> client
> > > > node via ScanQuery and all flow of vectors from partitions will be
> sent
> > > to
> > > > a client. I want to avoid such behavior and I propose the framework
> for
> > > > metrics computation using MapReduce approach based on an aggregation
> of
> > > > statistics for metrics.
> > > >
> > > > I prepared an issue in Apache Jira for this:
> > > > https://issues.apache.org/jira/browse/IGNITE-12155
> > > > Also, I prepared PR for it:
> https://github.com/apache/ignite/pull/6857
> > > > Currently, the work on this framework is still running but I'm going
> to
> > > > prepare full PR during this week.
> > > >
> > > > By this email, I want to start a discussion about this idea.
> > > >
> > > > Best regards,
> > > > Alexey Platonov
> > >
>
>
>
> --
> Best Regards, Vyacheslav D.
>

Re: [ML] Distributed metrics computation

Reply via email to