>  it makes more sense to have tooling around Druid, to do slice and dice
the data that you need, and do the ml stuff in sklearn, or even in spark

I agree with this sentiment. Druid as an execution engine is very good at
doing distributed aggregation (distributed reduce). What advantage does
Druid as an engine have that Spark does not for ML?

Are you talking training or model evaluation? or any?

It *might* be possible to have a likeness mechanism, whereby you can pass
in a model as a filter and aggregate on rows (dimension tuples?) that match
the model by some minimum criteria, but I'm not really sure what utility
that would be. Maybe as a quick backtesting engine? I feel like I'm a
solution searching for a problem going down this route though.






On Mon, Jan 27, 2020 at 12:11 AM Driesprong, Fokko <fo...@driesprong.frl>
wrote:

> > Vertica has it. Good idea to introduce it in Druid.
>
> I'm not sure if this is a valid argument. With this argument, you can
> introduce anything into Druid. I think it is good to be opinionated, and as
> a community why we do or don't introduce ML possibilities into the
> software.
>
> For example, databases like Postgres and Bigquery allow users to do simple
> regression models:
> https://cloud.google.com/bigquery-ml/docs/bigqueryml-intro. I also don't
> think it isn't that hard to introduce linear regression using gradient
> decent into Druid:
>
> https://spin.atomicobject.com/2014/06/24/gradient-descent-linear-regression/
> However,
> how many people are going to use this?
>
> For me, it makes more sense to have tooling around Druid, to do slice and
> dice the data that you need, and do the ml stuff in sklearn, or even in
> spark. For example using https://github.com/druid-io/pydruid or having the
> ability to use Spark to read directly from the deep storage.
>
> Introducing models using SP or UDF's is also a possibility, but here I
> share the concerns of Sayat when it comes to performance and scalability.
>
> Cheers, Fokko
>
>
>
> Op za 25 jan. 2020 om 08:51 schreef Gaurav Bhatnagar <gaura...@gmail.com>:
>
> > +1
> >
> > Vertica has it. Good idea to introduce it in Druid.
> >
> > On Mon, Jan 13, 2020 at 12:52 AM Dusan Maric <thema...@gmail.com> wrote:
> >
> > > +1
> > >
> > > That would be a great idea! Thanks for sharing this.
> > >
> > > Would just like to chime in on Druid + ML model cases: predictions and
> > > anomaly detection on top of TensorFlow ❤
> > >
> > > Regards,
> > >
> > > On Fri, Jan 10, 2020 at 6:41 AM Roman Leventov <leventov...@gmail.com>
> > > wrote:
> > >
> > > > Hello Druid developers, what do you think about the future of Druid &
> > > > machine learning?
> > > >
> > > > Druid has been great at complex aggregations. Could (should?) It make
> > > > inroads into ML? Perhaps aggregators which apply the rows against
> some
> > > > pre-trained model and summarize results.
> > > >
> > > > Should model training stay completely external to Druid, or it could
> be
> > > > incorporated into Druid's data lifecycle on a conceptual level, such
> > as a
> > > > recurring "indexing" task which stores the result (the model) in
> > Druid's
> > > > deep storage, the model automatically loaded on historical nodes as
> > > needed
> > > > (just like segments) and certain aggregators pick up the latest
> model?
> > > >
> > > > Does this make any sense? In what cases Druid & ML will and will not
> > work
> > > > well together, and ML should stay a Spark's prerogative?
> > > >
> > > > I would be very interested to hear any thoughts on the topic, vague
> > ideas
> > > > and questions.
> > > >
> > >
> > >
> > > --
> > > Dušan Marić
> > > mob.: +381 64 1124779 | e-mail: thema...@gmail.com | skype: themaric
> > >
> >
>

Reply via email to