Hi, 

Thanks a lot everybody for the feedback. 

The package can certainly be made a stand-alone drop-in replacement for 
np.random. There are many points raised and unraised in favor of this, 
and it is easy to accomplish.  I will create a stand-alone package on github, 
but would still appreciate some help in reviewing it 
and making it available at PyPI.

Interestingly, Nathaniel's link to a representative changes, specifically 

    
https://github.com/oleksandr-pavlyk/numpy/blob/b53880432c19356f4e54b520958272516bf391a2/numpy/random_intel/mklrand/mkl_distributions.cpp#L1724-L1833

point at an unused code borrowed directly from mtrand/distributions.c:

https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/distributions.c#L262-L297

More representative change would be the implementation of Student's 
T-distribution:

https://github.com/oleksandr-pavlyk/numpy/blob/b53880432c19356f4e54b520958272516bf391a2/numpy/random_intel/mklrand/mkl_distributions.cpp#L232-L262
 

The module under review, similarly to randomstate package, provides alternative 
basic pseudo-random number generators (BRNGs), like MT2203, MCG31, MRG32K3A, 
Wichmann-Hill. The scope of support differ, with randomstate implementing some 
generators absent in MKL and vice-versa. 

Thinking about the possibility of providing the functionality of this module 
within the framework of randomstate, I find that randomstate implements 
samplers from statistical distributions as functions that take the state of the 
underlying BRNG, and produce a single variate, e.g.:

https://github.com/bashtage/ng-numpy-randomstate/blob/master/randomstate/distributions.c#L23-L26
 

This design stands in a way of efficient use of MKL, which generates a whole 
vector of variates at a time. This can be done faster than sampling a variate 
at a time by using vectorized instructions.  So I wrote mkl_distributions.cpp 
to provide functions that return a given size vector of sampled variates from 
each supported distribution.

mklrand.pyx was then written by modifying mtrand.pyx to work with such vector 
generators.   In particular, this allowed for efficient sampling from product 
distributions of Poisson distributions with different rate parameters, which is 
implemented in MKL:

https://software.intel.com/en-us/node/521894 

https://github.com/oleksandr-pavlyk/numpy/blob/b53880432c19356f4e54b520958272516bf391a2/numpy/random_intel/mklrand/mkl_distributions.cpp#L1071
 


Another point already raised by Nathaniel is that for numpy's randomness 
ideally should provide a way to override default algorithm for sampling from a 
particular distribution.  For example RandomState object that implements PCG 
may rely on default acceptance-rejection algorithm for sampling from Gamma, 
while the RandomState object that provides interface to MKL might want to call 
into MKL directly.

While at this topic, I also would like to point out the need for C-API 
interface to randomness, particularly felt writing parallel algorithms, where 
Python's GIL and use of Lock() in RandomState hurt scalability.

Oleksandr

-----Original Message-----
From: NumPy-Discussion [mailto:numpy-discussion-boun...@scipy.org] On Behalf Of 
Nathaniel Smith
Sent: Wednesday, October 26, 2016 2:25 PM
To: Discussion of Numerical Python <numpy-discussion@scipy.org>
Subject: Re: [Numpy-discussion] Intel random number package

On Wed, Oct 26, 2016 at 9:10 AM, Julian Taylor <jtaylor.deb...@googlemail.com> 
wrote:
> On 10/26/2016 06:00 PM, Julian Taylor wrote:
>>
>> On 10/26/2016 10:59 AM, Ralf Gommers wrote:
>>>
>>>
>>>
>>> On Wed, Oct 26, 2016 at 8:33 PM, Julian Taylor 
>>> <jtaylor.deb...@googlemail.com 
>>> <mailto:jtaylor.deb...@googlemail.com>>
>>> wrote:
>>>
>>>     On 26.10.2016 06:34, Charles R Harris wrote:
>>>     > Hi All,
>>>     >
>>>     > There is a proposed random number package PR now up on github:
>>>     > https://github.com/numpy/numpy/pull/8209
>>>     <https://github.com/numpy/numpy/pull/8209>. It is from
>>>     > oleksandr-pavlyk <https://github.com/oleksandr-pavlyk
>>>     <https://github.com/oleksandr-pavlyk>> and implements
>>>     > the number random number package using MKL for increased speed.
>>> I think
>>>     > we are definitely interested in the improved speed, but I'm 
>>> not sure
>>>     > numpy is the best place to put the package. I'd welcome any 
>>> comments on
>>>     > the PR itself, as well as any thoughts on the best way 
>>> organize or use
>>>     > of this work. Maybe scikit-random
>>>
>>>
>>> Note that this thread is a continuation of 
>>> https://mail.scipy.org/pipermail/numpy-discussion/2016-July/075822.h
>>> tml
>>>
>>>
>>>
>>>     I'm not a fan of putting code depending on a proprietary library
>>>     into numpy.
>>>     This should be a standalone package which may provide the same 
>>> interface
>>>     as numpy.
>>>
>>>
>>> I don't really see a problem with that in principle. Numpy can use 
>>> Intel MKL (and Accelerate) as well if it's available. It needs some 
>>> thought put into the API though - a ``numpy.random_intel`` module is 
>>> certainly not what we want.
>>>
>>
>> For me there is a difference between being able to optionally use a 
>> proprietary library as an alternative to free software libraries if 
>> the user wishes to do so and offering functionality that only works 
>> with non-free software.
>> We are providing a form of advertisement for them by allowing it (hey 
>> if you buy this black box that you cannot modify or use freely you 
>> get this neat numpy feature!).
>>
>> I prefer for the full functionality of numpy to stay available with a 
>> stack of community owned software, even if it may be less powerful 
>> that way.
>
> But then if this is really just the same random numbers numpy already 
> provides just faster, it is probably acceptable in principle. I 
> haven't actually looked at the PR yet.

The RNG stream is totally different, so yeah, it can't just be a silent drop-in 
replacement like BLAS/LAPACK.

The patch also adds ~10,000 lines of code; here's an example of what some of it 
looks like:

    
https://github.com/oleksandr-pavlyk/numpy/blob/b53880432c19356f4e54b520958272516bf391a2/numpy/random_intel/mklrand/mkl_distributions.cpp#L1724-L1833

I don't see how we can realistically commit to maintaining this.

I'm also not really seeing how shipping it as part of numpy provides extra 
benefits to maintainers or users? AFAICT right now it's basically structured as 
a standalone library that's been dropped into the numpy source tree, and it 
would be just as easy to ship separately (or am I wrong?). And since the public 
API is that all the functionality comes from importing this specific new module 
('numpy.random_intel'), it'd be a one-line change for users to import from a 
non-numpy namespace, like 'mkl.random' or whatever. If it were more integrated 
with the rest of numpy then the trade-offs would be more complicated, but in 
its present form this seems like an easy call.

The other question is whether it could/should change to *become* more 
integrated... that's more tricky. There's been some work towards supporting 
swappable backends inside np.random; but the focus has mostly been on allowing 
new core generators, though, and this code seems to want to take over the whole 
thing (core generator + distributions), so even once the swappable backends 
stuff is working I'm not sure it would be relevant here. The one case I can 
think of that does seem promising is that if we get an API for users to say "I 
don't care about stream compatibility, just give me un-reproducible variates as 
fast as you can", then it might make sense for that to silently use MKL if 
available -- this would be pretty analogous to the use of MKL in np.linalg. But 
we don't have that API yet, I'm not sure how the MKL fallback could be 
maintainably implemented given that it would require somehow swapping the 
entire RandomState implementation, and it's entirely possible that once we 
figure 
 out solutions to those then it'd still make sense for the actual MKL wrappers 
to live in a third-party library that numpy imports.

-n

--
Nathaniel J. Smith -- https://vorpus.org 
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to