Re: [Numpy-discussion] Low-level API for Random

Neal Becker Fri, 20 Sep 2019 04:20:07 -0700

I have used C-api in the past, and would like to see a convenient and
stable way to do this.  Currently I'm using randomgen, but calling
(from c++)
to the python api.  The inefficiency is amortized by generating and
caching batches of results.


I thought randomgen was supposed to be the future of numpy random, so
I've based on that.

On Fri, Sep 20, 2019 at 6:08 AM Ralf Gommers <[email protected]> wrote:
>
>
>
> On Fri, Sep 20, 2019 at 5:29 AM Robert Kern <[email protected]> wrote:
>>
>> On Thu, Sep 19, 2019 at 11:04 PM Ralf Gommers <[email protected]> wrote:
>>>
>>>
>>>
>>> On Thu, Sep 19, 2019 at 4:53 PM Robert Kern <[email protected]> wrote:
>>>>
>>>> On Thu, Sep 19, 2019 at 5:24 AM Ralf Gommers <[email protected]> 
>>>> wrote:
>>>>>
>>>>>
>>>>> On Thu, Sep 19, 2019 at 10:28 AM Kevin Sheppard 
>>>>> <[email protected]> wrote:
>>>>>>
>>>>>> There are some users of the NumPy C code in randomkit.  This was never 
>>>>>> officially supported.  There has been a long open issue to provide this 
>>>>>> officially.
>>>>>>
>>>>>> When I wrote randomgen I supplied .pdx files that make it simpler to 
>>>>>> write Cython code that uses the components.  The lower-level API has not 
>>>>>> had much scrutiny and is in need of a clean-up.   I thought this would 
>>>>>> also encourage users to extend the random machinery themselves as part 
>>>>>> of their project or code so as to minimize the requests for new (exotic) 
>>>>>> distributions to be included in Generator.
>>>>>>
>>>>>> Most of the generator functions follow a pattern random_DISTRIBUTION.  
>>>>>> Some have a bit more name mangling which can easily be cleaned up (like 
>>>>>> ranomd_gauss_zig, which should become PREFIX_standard_normal).
>>>>>>
>>>>>> Ralf Gommers suggested unprefixed names.
>>>>>
>>>>>
>>>>> I suggested that the names should match the Python API, which I think 
>>>>> isn't quite the same. The Python API doesn't contain things like "gamma", 
>>>>> "t" or "f".
>>>>
>>>>
>>>> As the implementations evolve, they aren't going to match one-to-one 100%. 
>>>> The implementations are shared by the legacy RandomState. When we update 
>>>> an algorithm, we'll need to make a new function with the better algorithm 
>>>> for Generator to use, then we'll have two C functions roughly 
>>>> corresponding to the same method name (albeit on different classes). C 
>>>> doesn't give us as many namespace options as Python. We could rely on 
>>>> conventional prefixes to distinguish between the two classes of function 
>>>> (e.g. legacy_normal vs random_normal).
>>>
>>>
>>> That seems simple and clear
>>>
>>>> There are times when it would be nice to be more descriptive about the 
>>>> algorithm difference (e.g. random_normal_polar vs random_normal_ziggurat),
>>>
>>>
>>> We decided against versioning algorithms in NEP 19, so an update to an 
>>> algorithm would mean we'd want to get rid of the older version (unless it's 
>>> still in use by legacy). So AFAICT we'd never have both random_normal_polar 
>>> and random_normal_ziggurat present at the same time?
>>
>>
>> Well, we must because one's used by the legacy RandomState and one's used by 
>> Generator. :-)
>>
>>>
>>> I may be missing your point here, but if we have in Python 
>>> `Generator.normal` and can switch its implementation from polar to ziggurat 
>>> or vice versa without any deprecation, then why would we want to switch 
>>> names in the C API?
>>
>>
>> I didn't mean to suggest that we'd have an unbounded number of functions as 
>> we improve the algorithms, just that we might have 2 once we decide to 
>> change something about the algorithm. We need 2 to support both the improved 
>> algorithm in Generator and the legacy algorithm in RandomState. The current 
>> implementation of the C function would be copied to a new name (`legacy_foo` 
>> or whatever), then we'd make RandomState use that frozen copy, then we make 
>> the desired modifications to the main function that Generator is referencing 
>> (`random_foo`).
>>
>> Or we could just make those legacy copies now so that people get to use them 
>> explicitly under the legacy names, whatever they are, and we can feel more 
>> free to modify the main implementations. I suggested this earlier, but 
>> convinced myself that it wasn't strictly necessary. But then I admit I was 
>> more focused on the Python API stability than any promises about the 
>> C/Cython API.
>>
>> We might end up with more than 2 implementations if we need to change 
>> something about the function signature, for whatever reason, and we want to 
>> retain C/Cython API compatibility with older code. The C functions aren't 
>> necessarily going to be one-to-one to the Generator methods. They're just 
>> part of the implementation. So for example, if we wanted to, say, precompute 
>> some intermediate values from the given scalar parameters so we don't have 
>> to recompute them for each element of the `size`-large requested output, we 
>> might do that in one C function and pass those intermediate values as 
>> arguments to the C function that does the actual sampling. So we'd have two 
>> C functions for that one Generator method, and the sampling C function will 
>> not have the same signature as it did before the modification that 
>> refactored the work into two functions. In that case, I would not be so 
>> strict as to require that `Generator.foo` is one to one with `random_foo`.
>
>
> You're saying "be so strict" as if it were a bad thing, or a major effort. I 
> understand that in some cases a C API can not be evolved in the same way as a 
> Python API, but in the example you're giving here I'd say you want one 
> function to be public, and one private. Making both public just exposes more 
> implementation details for no good reason, and will give us more maintenance 
> issues long-term.
>
> Anyway, this is not an issue today. If we try to keep Python and C APIs 
> matching, we can deal with possible difficulties with that if and when they 
> arise - should be infrequent.
>
> Cheers,
> Ralf
>
>>
>> To your point, though, we don't have to use gratuitously different names 
>> when there _is_ a one-to-one relationship. `random_gauss_zig` should be 
>> `random_normal`.
>>
>> --
>> Robert Kern
>> _______________________________________________
>> NumPy-Discussion mailing list
>> [email protected]
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/numpy-discussion



-- 
Those who don't understand recursion are doomed to repeat it
_______________________________________________
NumPy-Discussion mailing list
[email protected]
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Low-level API for Random

Reply via email to