Hi Janardhan,

The number of parameters could be rather large, that's certainly an issue 
for Bayesian Optimization.  A perfect implementation would, perhaps, pick 
a sample of parameters and a sample of the dataset for every iteration. It 
seems that Sobol sequences require generating primitive polynomials of 
large degree.  What is better: a higher-dimensional B.O., or a 
lower-dimensional one combined with parameter sampling?  Probably the 
latter.  By the way, in cases where parameters feed into heuristics, there 
may be considerable independence across the set of parameters, especially 
when conditioned by a specific dataset record.  Each heuristic targets 
certain situations that arise in some records.  Not sure how to take 
advantage of this.

Thanks,
Sasha



From:   Janardhan Pulivarthi <janardhan.pulivar...@gmail.com>
To:     Alexandre V Evfimievski <evf...@us.ibm.com>, npan...@us.ibm.com, 
dev@systemml.apache.org
Date:   08/10/2017 09:39 AM
Subject:        Re: Bayesian optimizer support for SystemML.



Hi Sasha,

And one more thing, I would like to ask, what are you thinking about 
`sobol` function. What is the dimension requirement and pattern of 
sampling?. Please help me understand, what are the tasks exactly that we 
are going to optimize, in SystemML.

Surrogate slice sampling - What are your thoughts about it.

Thank you very much,
Janardhan 

On Wed, Jul 26, 2017 at 12:25 AM, Alexandre V Evfimievski <
evf...@us.ibm.com> wrote:
Hi, Janardhan,

We are still studying Bayesian Optimization (B.O.), you are ahead of us!  
Just one comment:  The "black box" loss function that is being optimized 
is not always totally black.  Sometimes it is a sum of many small 
black-box functions.  Suppose we want to train a complex system with many 
parameters over a large dataset.  The system involves many heuristics, and 
the parameters feed into these heuristics.  We want to minimize a loss 
function, which is a sum of individual losses per each data record.  We 
want to use B.O. to find an optimal vector of parameters.  The parameters 
affect the system's behavior in complex ways and do not allow for the 
computation of a gradient.  However, because the loss is a sum of many 
losses, when running B.O., we have a choice: either to run each test over 
the entire dataset, or to run over a small sample of the dataset (but try 
more parameter vectors per hour, say).  The smaller the sample, the higher 
the variance of the loss.  Not sure which implementation of B.O. is the 
best to handle such a case.

Thanks,
Alexandre (Sasha)



From:        Janardhan Pulivarthi <janardhan.pulivar...@gmail.com>
To:        dev@systemml.apache.org
Date:        07/25/2017 10:33 AM
Subject:        Re: Bayesian optimizer support for SystemML.



Hi Niketan and Mike,

As we are trying to implement this Bayesian Optimization, should we take
input from more committers as well as this optimizer approach seems to 
have
a couple of ways to implement. We may need to find out which suits us the
best.

Thanks,
Janardhan

On Sat, Jul 22, 2017 at 3:41 PM, Janardhan Pulivarthi <
janardhan.pulivar...@gmail.com> wrote:

> Dear committers,
>
> We will be planning to add bayesian optimizer support for both the ML 
and
> Deep learning tasks for the SystemML. Relevant jira link:
> https://issues.apache.org/jira/browse/SYSTEMML-979
>
> The following is a simple outline of how we are going implement it. 
Please
> feel free to make any kind of changes. In this google docs link:
> http://bit.do/systemml-bayesian
>
> Description:
>
> Bayesian optimization is a sequential design strategy for global
> optimization of black-box functions that doesn’t require derivatives.
>
> Process:
>
>    1.
>
>    First we select a point that will be the best as far as the no. of
>    iterations that has happened.
>    2.
>
>    Candidate point selection with sampling from Sobol quasirandom
>    sequence generator the space.
>    3.
>
>    Gaussian process hyperparameter sampling with surrogate slice 
sampling
>    method.
>
>
> Components:
>
>    1.
>
>    Selecting the next point to Evaluate.
>
> [image: nextpoint.PNG]
>
> We specify a uniform prior for the mean, m, and width 2 top-hat priors 
for
> each of the D length scale parameters. As we expect the observation 
noise
> generally to be close to or exactly zero, v(nu) is given a horseshoe
> prior. The covariance amplitude theta0 is given a zero mean, unit 
variance
> lognormal prior, theta0 ~ ln N (0, 1).
>
>
>
>    1.
>
>    Generation of QuasiRandom Sobol Sequence.
>
> Which kind of sobol patterns are needed?
>
> [image: sobol patterns.PNG]
>
> How many dimensions do we need?
>
> This paper argues that its generation target dimension is 21201. [pdf 
link
> <
https://researchcommons.waikato.ac.nz/bitstream/handle/10289/967/Joe%20constructing.pdf
>
> ]
>
>
>
>    1.
>
>    Surrogate Slice Sampling.
>
> [image: surrogate data sampling.PNG]
>
>
> References:
>
> 1. For the next point to evaluate:
>
> https://papers.nips.cc/paper/4522-practical-bayesian-

> optimization-of-machine-learning-algorithms.pdf
>
>  http://www.dmi.usherb.ca/~larocheh/publications/gpopt_nips_appendix.pdf
>
>
> 2. QuasiRandom Sobol Sequence Generator:
>
> https://researchcommons.waikato.ac.nz/bitstream/handle/10289/967/Joe%
> 20constructing.pdf
>
>
> 3. Surrogate Slice Sampling:
>
> http://homepages.inf.ed.ac.uk/imurray2/pub/10hypers/hypers.pdf
>
>
>
> Thank you so much,
>
> Janardhan
>
>
>
>







Reply via email to