on distributions, I did not find anything multivariate Mahout Matrix-based.
Hopefully, i did not look well enough. Everything univariate seems to be
pretty spotty. Aside from that, i need scala traits, plus i find it
extremely unelegant (un-scala, if you want) to write something like `new
MultivariateUniformDistribution(mu,sigma).sample()`, so i really just
dsl-bridged for most part. There are enough third party choices not to
bother with filling the gaps.

On step-recorded evolutionary search, after my literature search on the
topic, it doesnt look like even distant third best choice, in particular
under big data training settings.

First, i did not find head-to-head comparisons of that with any of top
choices. It is not included in Amplab survey of top search choices. GP-EI
is Netflix's choice, for example. So there's very little convincing data to
go on, to begin with. So given lack of such comparisons, the next best
thing is to copy what others do here.

Second, under big data settings, every data point (training) is precious.
In spark specifically, unlike MR,  since we want to retain as much data in
RAM is possible and avoid spills, best performance is usually achieved by
sequentially semaphoring trainings rather then throwing a whole bunch of
them out at once. Especially under circumstances where companies are
extremely anemic in provisioning hardware needed for whatever reason. In
that sense, exploration algorithms that are capable of making better
inference after each new data point, and arrive to a reasonably performing
model in ~20..30 sequential trains are infinitely more preferable, rather
than those that require a whole bunch of trainings to happen to begin to
figure the next centroid of trials. I am not even sure if step-recorded
search was even ever tried outside SGD where datapoints are abundant albeit
incomplete.



On Tue, Aug 26, 2014 at 8:32 AM, Ted Dunning <[email protected]> wrote:

> On Mon, Aug 25, 2014 at 2:40 PM, Dmitriy Lyubimov <[email protected]>
> wrote:
>
> > This work is obviously also interesting in that it
> > establishes probabilistic framework in Mahout (distributions & gaussian
> > process).
> >
>
> We already have that.
>
> (distributions not GP)
>
> Note that we also have an implementation of recorded step evolutionary
> programming that works really well for hyper-parameter search.  I don't
> like the way that the API turned out (too hard to understand).
>

Reply via email to