> One corner case is sketches which are time series, so models could be
applied to them individually.

Or if there is some case for composeable models that have some sort of
intermediate stage. I don't know of any models who have intermediate stages
which are associative and commutative, but if there were it might be a case
to quickly derive new models from combining intermediate stages in an
ad-hoc fashion.


On Tue, Jan 28, 2020 at 12:39 AM Roman Leventov <leventov...@gmail.com>
wrote:

> However, I now see the Charles' point -- the data which is typically stored
> in Druid rows is simple and is not something models are typically applied
> to. Timeseries themselves (that is, the results of timeseries queries in
> Druid) may be an input for anomaly detection or phase transition models,
> but there is not point in applying them inside Druid.
>
> One corner case is sketches which are time series, so models could be
> applied to them individually.
>
> On Tue, 28 Jan 2020 at 08:59, Roman Leventov <leventov...@gmail.com>
> wrote:
>
> > I was thinking about model training at Druid indexing side and evaluation
> > at Druid querying side.
> >
> > The advantage Druid has over Spark at querying is faster row filtering
> > thanks to bitset indexes. But since model evaluation is a pretty heavy
> > operation (I suppose; does anyone has ballpark time estimates? how does
> it
> > compare to Sketch update?) then row scanning may not be the bottleneck
> and
> > therefore no significant reason to use Druid instead of just plugging
> Spark
> > engine to Druid segments.
> >
> > At indexing side, Druid indexer may be considered a general-purpose job
> > scheduler so that somebody who already has Druid may leverage it instead
> of
> > setting up a separate Airflow scheduler.
> >
> > On Tue, 28 Jan 2020, 06:46 Charles Allen, <cral...@apache.org> wrote:
> >
> >> >  it makes more sense to have tooling around Druid, to do slice and
> dice
> >> the data that you need, and do the ml stuff in sklearn, or even in spark
> >>
> >> I agree with this sentiment. Druid as an execution engine is very good
> at
> >> doing distributed aggregation (distributed reduce). What advantage does
> >> Druid as an engine have that Spark does not for ML?
> >>
> >> Are you talking training or model evaluation? or any?
> >>
> >> It *might* be possible to have a likeness mechanism, whereby you can
> pass
> >> in a model as a filter and aggregate on rows (dimension tuples?) that
> >> match
> >> the model by some minimum criteria, but I'm not really sure what utility
> >> that would be. Maybe as a quick backtesting engine? I feel like I'm a
> >> solution searching for a problem going down this route though.
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Mon, Jan 27, 2020 at 12:11 AM Driesprong, Fokko <fo...@driesprong.frl
> >
> >> wrote:
> >>
> >> > > Vertica has it. Good idea to introduce it in Druid.
> >> >
> >> > I'm not sure if this is a valid argument. With this argument, you can
> >> > introduce anything into Druid. I think it is good to be opinionated,
> >> and as
> >> > a community why we do or don't introduce ML possibilities into the
> >> > software.
> >> >
> >> > For example, databases like Postgres and Bigquery allow users to do
> >> simple
> >> > regression models:
> >> > https://cloud.google.com/bigquery-ml/docs/bigqueryml-intro. I also
> >> don't
> >> > think it isn't that hard to introduce linear regression using gradient
> >> > decent into Druid:
> >> >
> >> >
> >>
> https://spin.atomicobject.com/2014/06/24/gradient-descent-linear-regression/
> >> > However,
> >> > how many people are going to use this?
> >> >
> >> > For me, it makes more sense to have tooling around Druid, to do slice
> >> and
> >> > dice the data that you need, and do the ml stuff in sklearn, or even
> in
> >> > spark. For example using https://github.com/druid-io/pydruid or
> having
> >> the
> >> > ability to use Spark to read directly from the deep storage.
> >> >
> >> > Introducing models using SP or UDF's is also a possibility, but here I
> >> > share the concerns of Sayat when it comes to performance and
> >> scalability.
> >> >
> >> > Cheers, Fokko
> >> >
> >> >
> >> >
> >> > Op za 25 jan. 2020 om 08:51 schreef Gaurav Bhatnagar <
> >> gaura...@gmail.com>:
> >> >
> >> > > +1
> >> > >
> >> > > Vertica has it. Good idea to introduce it in Druid.
> >> > >
> >> > > On Mon, Jan 13, 2020 at 12:52 AM Dusan Maric <thema...@gmail.com>
> >> wrote:
> >> > >
> >> > > > +1
> >> > > >
> >> > > > That would be a great idea! Thanks for sharing this.
> >> > > >
> >> > > > Would just like to chime in on Druid + ML model cases: predictions
> >> and
> >> > > > anomaly detection on top of TensorFlow ❤
> >> > > >
> >> > > > Regards,
> >> > > >
> >> > > > On Fri, Jan 10, 2020 at 6:41 AM Roman Leventov <
> >> leventov...@gmail.com>
> >> > > > wrote:
> >> > > >
> >> > > > > Hello Druid developers, what do you think about the future of
> >> Druid &
> >> > > > > machine learning?
> >> > > > >
> >> > > > > Druid has been great at complex aggregations. Could (should?) It
> >> make
> >> > > > > inroads into ML? Perhaps aggregators which apply the rows
> against
> >> > some
> >> > > > > pre-trained model and summarize results.
> >> > > > >
> >> > > > > Should model training stay completely external to Druid, or it
> >> could
> >> > be
> >> > > > > incorporated into Druid's data lifecycle on a conceptual level,
> >> such
> >> > > as a
> >> > > > > recurring "indexing" task which stores the result (the model) in
> >> > > Druid's
> >> > > > > deep storage, the model automatically loaded on historical nodes
> >> as
> >> > > > needed
> >> > > > > (just like segments) and certain aggregators pick up the latest
> >> > model?
> >> > > > >
> >> > > > > Does this make any sense? In what cases Druid & ML will and will
> >> not
> >> > > work
> >> > > > > well together, and ML should stay a Spark's prerogative?
> >> > > > >
> >> > > > > I would be very interested to hear any thoughts on the topic,
> >> vague
> >> > > ideas
> >> > > > > and questions.
> >> > > > >
> >> > > >
> >> > > >
> >> > > > --
> >> > > > Dušan Marić
> >> > > > mob.: +381 64 1124779 | e-mail: thema...@gmail.com | skype:
> >> themaric
> >> > > >
> >> > >
> >> >
> >>
> >
>

Reply via email to