Re: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState

2015-06-09 Thread Antony Lee
2015-05-29 14:06 GMT-07:00 Antony Lee antony@berkeley.edu:


 A proof-of-concept implementation, still missing tests, is tracked as
 #5911.  It includes the patch proposed in #5158 as an example of how to
 include an improved version of random.choice.


 Tests are in now (whether we should bundle in pickles of old versions to
 make sure they are still unpickled correctly and outputs of old random
 streams to make sure they are still reproduced is a good question, though).
 Comments welcome.


Kindly bumping the issue.

Antony
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState

2015-05-29 Thread Antony Lee

 A proof-of-concept implementation, still missing tests, is tracked as
 #5911.  It includes the patch proposed in #5158 as an example of how to
 include an improved version of random.choice.


Tests are in now (whether we should bundle in pickles of old versions to
make sure they are still unpickled correctly and outputs of old random
streams to make sure they are still reproduced is a good question, though).
Comments welcome.

Antony
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState

2015-05-25 Thread Daπid
On 24 May 2015 at 22:30, Sturla Molden sturla.mol...@gmail.com wrote:

 Personally I think we should only make guarantees about the data types,
 array shapes, and things like that, but not about the values. Those who
 need a particular version of NumPy for exact reproducibility should
 install the version of Python and NumPy they need. That is why virtual
 environments exist.


But there is a lot of legacy code out there that doesn't specify the
version required; and in most cases the original author cannot even be
asked.

Tests are a particularly annoying case. For example, when testing an
algorithm, is usually a good practice to record the number of iterations as
well as the result; consider it an early warning that we have changed
something we possibly didn't mean to, even if the result is correct. If we
want to support several NumPy versions, and the algorithm has any
randomness, the tests would have to be duplicated, or find a seed that
gives the exact same results. Thus, keeping different versions lets us
compare the results against the old API, without needing to duplicate the
tests. A lot less people will get annoyed.


/David.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState

2015-05-24 Thread Ralf Gommers
On Sun, May 24, 2015 at 2:41 PM, Alan G Isaac alan.is...@gmail.com wrote:

 I echo Ralf's question.
 For those who need replicability, the proposed upgrade path seems quite
 radical.


It's not radical, and my question was already answered. Nothing changes if
you are doing:

   np.random.seed(1234)
   np.random.any_random_sample_generator_func()

Values only change if you leave out the call to seed(), which you should
never do if you care about replicability.

Ralf



 Also, I would prefer to have the new functionality introduced beside the
 existing
 implementation of RandomState, with an announcement that RandomState
 will change in the next major numpy version number.  This will allow
 everyone
 who wants to to change now, without requiring that users attend to minor
 numpy version numbers if they want replicability.

 I think this is what is required by semantic versioning.

 Alan Isaac



 On 5/24/2015 4:59 AM, Ralf Gommers wrote:
  the reasoning on this point is shaky. np.random.seed() is *very* widely
 used, and works fine for a test suite where each test that needs random
  numbers calls seed(...) and is run with nose. Can you explain why you
 need to touch the behavior of the global methods in order to make
  RandomState(version=) work?

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState

2015-05-24 Thread Nathaniel Smith
On May 24, 2015 2:03 AM, Ralf Gommers ralf.gomm...@gmail.com wrote:

 On Sun, May 24, 2015 at 10:22 AM, Antony Lee antony@berkeley.edu
wrote:

 Hi,

 As mentioned in

 #1450: Patch with Ziggurat method for Normal distribution
 #5158: ENH: More efficient algorithm for unweighted random choice
without replacement
 #5299: using `random.choice` to sample integers in a large range
 #5851: Bug in np.random.dirichlet for small alpha parameters

 some methods on np.random.RandomState are implemented either
non-optimally (#1450, #5158, #5299) or have outright bugs (#5851), but
cannot be easily changed due to backwards compatibility concerns.  While
some have suggested new methods deprecating the old ones (see e.g. #5872),
some consensus has formed around the following ideas (see #5299 for
original discussion, followed by private discussions with @njsmith):

 - Backwards compatibility should only be provided to those who were
explicitly instantiating a seeded RandomState object or reseeding a
RandomState object to a given value, and drawing variates from it: using
the global methods (or a None-seeded RandomState) was already
non-reproducible anyways as e.g. other libraries could be drawing variates
from the global RandomState (of which the free functions in np.random are
actually methods).  Thus, the global RandomState object should use the
latest implementation of the methods.


 The rest of the proposal looks good to me, but the reasoning on this
point is shaky. np.random.seed() is *very* widely used, and works fine for
a test suite where each test that needs random numbers calls seed(...) and
is run with nose. Can you explain why you need to touch the behavior of the
global methods in order to make RandomState(version=) work?

You're absolutely right about it being important to preserve the behavior
of the global functions when seeded, but I think this is just a bug in the
description of the proposal here, not in the proposal itself :-).

If you look at the PR, there's no change to how the global functions work
-- they're still just a transparently thin wrapper around a hidden, global
RandomState object, and thus IIUC changes to RandomState will automatically
apply to the global functions as well.

So with this proposal, an unseeded RandomState uses the latest version -
therefore the global functions, which start out unseeded, start out using
the latest version. If you call .seed() on an existing RandomState object
and pass in a seed but no version= argument, the version gets reset to 0 -
therefore if you call the global seed() function and pass in a seed but no
version= argument, the global RandomState gets reset to version 0 (at least
until the next time seed() is called), and backcompat is preserved.

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState

2015-05-24 Thread Alan G Isaac
On 5/24/2015 8:47 AM, Ralf Gommers wrote:
 Values only change if you leave out the call to seed()


OK, but this claim seems to conflict with the following language:
the global RandomState object should use the latest implementation of the 
methods.
I take it that this is what Nathan meant by
I think this is just a bug in the description of the proposal here, not in the 
proposal itself.

So, is the correct phrasing
the global RandomState object should use the latest implementation of the 
methods, unless explicitly seeded?

Thanks,
Alan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState

2015-05-24 Thread Ralf Gommers
On Sun, May 24, 2015 at 10:22 AM, Antony Lee antony@berkeley.edu
wrote:

 Hi,

 As mentioned in

 #1450: Patch with Ziggurat method for Normal distribution
 #5158: ENH: More efficient algorithm for unweighted random choice without
 replacement
 #5299: using `random.choice` to sample integers in a large range
 #5851: Bug in np.random.dirichlet for small alpha parameters

 some methods on np.random.RandomState are implemented either non-optimally
 (#1450, #5158, #5299) or have outright bugs (#5851), but cannot be easily
 changed due to backwards compatibility concerns.  While some have suggested
 new methods deprecating the old ones (see e.g. #5872), some consensus has
 formed around the following ideas (see #5299 for original discussion,
 followed by private discussions with @njsmith):

 - Backwards compatibility should only be provided to those who were
 explicitly instantiating a seeded RandomState object or reseeding a
 RandomState object to a given value, and drawing variates from it: using
 the global methods (or a None-seeded RandomState) was already
 non-reproducible anyways as e.g. other libraries could be drawing variates
 from the global RandomState (of which the free functions in np.random are
 actually methods).  Thus, the global RandomState object should use the
 latest implementation of the methods.


The rest of the proposal looks good to me, but the reasoning on this point
is shaky. np.random.seed() is *very* widely used, and works fine for a test
suite where each test that needs random numbers calls seed(...) and is run
with nose. Can you explain why you need to touch the behavior of the global
methods in order to make RandomState(version=) work?

Ralf


- RandomState(seed) and r = RandomState(...); r.seed(seed) should offer
 backwards-compatibility guarantees (see e.g.
 https://docs.python.org/3.4/library/random.html#notes-on-reproducibility).

 As such, we propose the following improvements to the API:

 - RandomState gains a (keyword-only) parameter, version, also accessible
 as a read-only attribute.  This indicates the version of the methods on the
 object.  The current version of RandomState is retroactively assigned
 version 0.  The latest available version is available as
 np.random.LATEST_VERSION.  Backwards-incompatible improvements to
 RandomState methods can be introduced but increase the LAGTEST_VERSION.

 - The global RandomState is instantiated as
 RandomState(version=LATEST_VERSION).

 - RandomState() and rs.seed() sets the version to LATEST_VERSION.

 - RandomState(seed[!=None]) and rs.seed(seed[!=None]) sets the version to
 0.

 A proof-of-concept implementation, still missing tests, is tracked as
 #5911.  It includes the patch proposed in #5158 as an example of how to
 include an improved version of random.choice.

 Comments, and help for writing tests (in particular to make sure backwards
 compatibility is maintained) are welcome.

 Antony Lee

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState

2015-05-24 Thread Ralf Gommers
On Sun, May 24, 2015 at 11:30 AM, Nathaniel Smith n...@pobox.com wrote:

 So with this proposal, an unseeded RandomState uses the latest version -
 therefore the global functions, which start out unseeded, start out using
 the latest version. If you call .seed() on an existing RandomState object
 and pass in a seed but no version= argument, the version gets reset to 0 -
 therefore if you call the global seed() function and pass in a seed but no
 version= argument, the global RandomState gets reset to version 0 (at least
 until the next time seed() is called), and backcompat is preserved.

 On May 24, 2015 2:03 AM, Ralf Gommers ralf.gomm...@gmail.com wrote:
 
  On Sun, May 24, 2015 at 10:22 AM, Antony Lee antony@berkeley.edu
 wrote:
 
  Hi,
 
  As mentioned in
 
  #1450: Patch with Ziggurat method for Normal distribution
  #5158: ENH: More efficient algorithm for unweighted random choice
 without replacement
  #5299: using `random.choice` to sample integers in a large range
  #5851: Bug in np.random.dirichlet for small alpha parameters
 
  some methods on np.random.RandomState are implemented either
 non-optimally (#1450, #5158, #5299) or have outright bugs (#5851), but
 cannot be easily changed due to backwards compatibility concerns.  While
 some have suggested new methods deprecating the old ones (see e.g. #5872),
 some consensus has formed around the following ideas (see #5299 for
 original discussion, followed by private discussions with @njsmith):
 
  - Backwards compatibility should only be provided to those who were
 explicitly instantiating a seeded RandomState object or reseeding a
 RandomState object to a given value, and drawing variates from it: using
 the global methods (or a None-seeded RandomState) was already
 non-reproducible anyways as e.g. other libraries could be drawing variates
 from the global RandomState (of which the free functions in np.random are
 actually methods).  Thus, the global RandomState object should use the
 latest implementation of the methods.
 
 
  The rest of the proposal looks good to me, but the reasoning on this
 point is shaky. np.random.seed() is *very* widely used, and works fine for
 a test suite where each test that needs random numbers calls seed(...) and
 is run with nose. Can you explain why you need to touch the behavior of the
 global methods in order to make RandomState(version=) work?
 You're absolutely right about it being important to preserve the behavior
 of the global functions when seeded, but I think this is just a bug in the
 description of the proposal here, not in the proposal itself :-). If you
 look at the PR, there's no change to how the global functions work --
 they're still just a transparently thin wrapper around a hidden, global
 RandomState object, and thus IIUC changes to RandomState will automatically
 apply to the global functions as well.

 Thanks for the clarification. Then +1 from me for this proposal.

Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState

2015-05-24 Thread Alan G Isaac
I echo Ralf's question.
For those who need replicability, the proposed upgrade path seems quite radical.

Also, I would prefer to have the new functionality introduced beside the 
existing
implementation of RandomState, with an announcement that RandomState
will change in the next major numpy version number.  This will allow everyone
who wants to to change now, without requiring that users attend to minor
numpy version numbers if they want replicability.

I think this is what is required by semantic versioning.

Alan Isaac



On 5/24/2015 4:59 AM, Ralf Gommers wrote:
 the reasoning on this point is shaky. np.random.seed() is *very* widely used, 
 and works fine for a test suite where each test that needs random
 numbers calls seed(...) and is run with nose. Can you explain why you need to 
 touch the behavior of the global methods in order to make
 RandomState(version=) work?

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState

2015-05-24 Thread josef.pktd
On Sun, May 24, 2015 at 9:08 AM, Alan G Isaac alan.is...@gmail.com wrote:

 On 5/24/2015 8:47 AM, Ralf Gommers wrote:
  Values only change if you leave out the call to seed()


 OK, but this claim seems to conflict with the following language:
 the global RandomState object should use the latest implementation of the
 methods.
 I take it that this is what Nathan meant by
 I think this is just a bug in the description of the proposal here, not
 in the proposal itself.

 So, is the correct phrasing
 the global RandomState object should use the latest implementation of the
 methods, unless explicitly seeded?


that's how I understand it.

I don't see any problems with the clarified proposal for the use cases that
I know of.

Can we choose the version also for the global random state, for example to
fix both version and seed in unit tests, with version  0?


BTW: I would expect that bug fixes are still exempt from backwards
compatibility.

fixing #5851 should be independent of the version, (without having looked
at the issue)

(If you need to replicate bugs, then use an old version of a package.)

Josef



 Thanks,
 Alan
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState

2015-05-24 Thread josef.pktd
On Sun, May 24, 2015 at 1:49 PM, Nathaniel Smith n...@pobox.com wrote:

 On May 24, 2015 8:43 AM, josef.p...@gmail.com wrote:
 
  Reminder: we are bottom or inline posting

 Can we stop hassling people about this? Inline replies are a great tool to
 have in your toolkit for complicated technical discussions, but I feel like
 our weird insistence on them has turned into a pointless and exclusionary
 thing. It's not like bottom replying is even any better -- the traditional
 mailing list rule is you trim quotes to just the part you're replying to
 (like this message); quoting the whole thing and replying underneath just
 to give people a bit of exercise for their scrolling finger would totally
 have gotten you flamed too.

 But email etiquette has moved on since the 90s, even regular posters to
 this list violate this rule all the time, it's time to let it go.


It's not a 90's thing and I learned about it around 2009 when I started in
here.
I find it very annoying trying to catch up with a longer thread and the
replies are all over the place.


Anne is a few years older than I in terms of numpy and scipy participation
and this was just intended to be a friendly reminder.

And as BTW: I'm glad Anne is back with scipy.


Josef



 -n

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState

2015-05-24 Thread Sturla Molden
On 24/05/15 17:13, Anne Archibald wrote:
 Do we want a deprecation-like approach, so that eventually people who
 want replicability will specify versions, and everyone else gets bug
 fixes and improvements? This would presumably take several major
 versions, but it might avoid people getting unintentionally trapped on
 this version.

 Incidentally, bug fixes are complicated: if a bug fix uses more or fewer
 raw random numbers, it breaks repeatability not just for the call that
 got fixed but for all successive random number generations.


If a function has a bug, changing it will change the output of the 
function. This is not special for random numbers. If not retaining the 
old erroneous output means we break-backwards compatibility, then no 
bugs can ever be fixed, anywhere in NumPy. I think we need to clarify 
what we mean by backwards compatibility for random numbers. What 
guarantees should we make from one version to another?


Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState

2015-05-24 Thread Robert Kern
On Sun, May 24, 2015 at 7:56 PM, Sturla Molden sturla.mol...@gmail.com
wrote:

 On 24/05/15 20:04, Nathaniel Smith wrote:

  I'm not sure what you're envisioning as needing a deprecation cycle? The
  neat thing about random is that we already have a way for users to say
  that they want replicability -- the use of an explicit seed --

 No, this is not sufficient for random numbers. Random sampling and
 ziggurat generators are examples. If we introduce a change (e.g. a
 bugfix) that will affect the number of calls to the entropy source, just
 setting the seed will in general not be enough to ensure backwards
 compatibility. That is e.g. the case with using ziggurat samplers
 instead of the current transcendental transforms for normal, exponential
 and gamma distributions. While ziggurat is faster (and to my knowledge)
 more accurate, it will also make a different number of calls to the
 entropy source, and hence the whole sequence will be affected, even if
 you do set a random seed.

Please reread the proposal at the top of the thread.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState

2015-05-24 Thread Anne Archibald
Do we want a deprecation-like approach, so that eventually people who want
replicability will specify versions, and everyone else gets bug fixes and
improvements? This would presumably take several major versions, but it
might avoid people getting unintentionally trapped on this version.

Incidentally, bug fixes are complicated: if a bug fix uses more or fewer
raw random numbers, it breaks repeatability not just for the call that got
fixed but for all successive random number generations.

Anne

On Sun, May 24, 2015 at 5:04 PM josef.p...@gmail.com wrote:

 On Sun, May 24, 2015 at 9:08 AM, Alan G Isaac alan.is...@gmail.com
 wrote:

 On 5/24/2015 8:47 AM, Ralf Gommers wrote:
  Values only change if you leave out the call to seed()


 OK, but this claim seems to conflict with the following language:
 the global RandomState object should use the latest implementation of
 the methods.
 I take it that this is what Nathan meant by
 I think this is just a bug in the description of the proposal here, not
 in the proposal itself.

 So, is the correct phrasing
 the global RandomState object should use the latest implementation of
 the methods, unless explicitly seeded?


 that's how I understand it.

 I don't see any problems with the clarified proposal for the use cases
 that I know of.

 Can we choose the version also for the global random state, for example to
 fix both version and seed in unit tests, with version  0?


 BTW: I would expect that bug fixes are still exempt from backwards
 compatibility.

 fixing #5851 should be independent of the version, (without having looked
 at the issue)

 (If you need to replicate bugs, then use an old version of a package.)

 Josef



 Thanks,
 Alan
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState

2015-05-24 Thread Robert Kern
On Sun, May 24, 2015 at 7:46 PM, Sturla Molden sturla.mol...@gmail.com
wrote:

 On 24/05/15 17:13, Anne Archibald wrote:
  Do we want a deprecation-like approach, so that eventually people who
  want replicability will specify versions, and everyone else gets bug
  fixes and improvements? This would presumably take several major
  versions, but it might avoid people getting unintentionally trapped on
  this version.
 
  Incidentally, bug fixes are complicated: if a bug fix uses more or fewer
  raw random numbers, it breaks repeatability not just for the call that
  got fixed but for all successive random number generations.

 If a function has a bug, changing it will change the output of the
 function. This is not special for random numbers. If not retaining the
 old erroneous output means we break-backwards compatibility, then no
 bugs can ever be fixed, anywhere in NumPy. I think we need to clarify
 what we mean by backwards compatibility for random numbers. What
 guarantees should we make from one version to another?

The policy thus far has been that we will fix bugs in the distributions and
make changes that allow a strictly wider domain of distribution parameters
(e.g. allowing b==0 where before we only allowed b0), but we will not make
other enhancements that would change existing good output.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState

2015-05-24 Thread josef.pktd
On Sun, May 24, 2015 at 11:13 AM, Anne Archibald archib...@astron.nl
wrote:

 Do we want a deprecation-like approach, so that eventually people who want
 replicability will specify versions, and everyone else gets bug fixes and
 improvements? This would presumably take several major versions, but it
 might avoid people getting unintentionally trapped on this version.

 Incidentally, bug fixes are complicated: if a bug fix uses more or fewer
 raw random numbers, it breaks repeatability not just for the call that got
 fixed but for all successive random number generations.


Reminder: we are bottom or inline posting





 Anne

 On Sun, May 24, 2015 at 5:04 PM josef.p...@gmail.com wrote:

 On Sun, May 24, 2015 at 9:08 AM, Alan G Isaac alan.is...@gmail.com
 wrote:

 On 5/24/2015 8:47 AM, Ralf Gommers wrote:
  Values only change if you leave out the call to seed()


 OK, but this claim seems to conflict with the following language:
 the global RandomState object should use the latest implementation of
 the methods.
 I take it that this is what Nathan meant by
 I think this is just a bug in the description of the proposal here, not
 in the proposal itself.

 So, is the correct phrasing
 the global RandomState object should use the latest implementation of
 the methods, unless explicitly seeded?


 that's how I understand it.

 I don't see any problems with the clarified proposal for the use cases
 that I know of.

 Can we choose the version also for the global random state, for example
 to fix both version and seed in unit tests, with version  0?


 BTW: I would expect that bug fixes are still exempt from backwards
 compatibility.

 fixing #5851 should be independent of the version, (without having
 looked at the issue)


I skimmed the issue.
In a strict sense it's not really a bug, the user doesn't get wrong
numbers, he or she gets Not A Number.

So there are no current usages that use the function in that range.

Josef




 (If you need to replicate bugs, then use an old version of a package.)

 Josef



 Thanks,
 Alan
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState

2015-05-24 Thread Nathaniel Smith
On May 24, 2015 8:43 AM, josef.p...@gmail.com wrote:

 Reminder: we are bottom or inline posting

Can we stop hassling people about this? Inline replies are a great tool to
have in your toolkit for complicated technical discussions, but I feel like
our weird insistence on them has turned into a pointless and exclusionary
thing. It's not like bottom replying is even any better -- the traditional
mailing list rule is you trim quotes to just the part you're replying to
(like this message); quoting the whole thing and replying underneath just
to give people a bit of exercise for their scrolling finger would totally
have gotten you flamed too.

But email etiquette has moved on since the 90s, even regular posters to
this list violate this rule all the time, it's time to let it go.

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState

2015-05-24 Thread Nathaniel Smith
On May 24, 2015 8:15 AM, Anne Archibald archib...@astron.nl wrote:

 Do we want a deprecation-like approach, so that eventually people who
want replicability will specify versions, and everyone else gets bug fixes
and improvements? This would presumably take several major versions, but it
might avoid people getting unintentionally trapped on this version.

I'm not sure what you're envisioning as needing a deprecation cycle? The
neat thing about random is that we already have a way for users to say that
they want replicability -- the use of an explicit seed -- so we can just
immediately go to the world you describe, where people who seed get to pick
their version (or default to version 0 for backcompat), and everyone else
gets the improvements automatically. Or is this different from what you
meant somehow?

Fortunately we haven't yet run into any really serious bugs in random, like
oops we're sampling from the wrong distribution type bugs. Mostly it's
more like oops this is really inefficient or oops this crashes in this
edge case, so there's no real harm in letting people use old versions. If
we did run into a case where we were giving flat out wrong results, then I
guess we'd still want to keep the code around because reproducibility is
still important, but perhaps with a requirement that you pass an extra
argument like I_know_its_broken=True or something so that people couldn't
end up running the broken code accidentally? I guess we'll cross that
bridge when we come to it.

 Incidentally, bug fixes are complicated: if a bug fix uses more or fewer
raw random numbers, it breaks repeatability not just for the call that got
fixed but for all successive random number generations.

Yep. This is why we mostly haven't been able to change behavior at *all*
except in cases where there was a clear error so we know no-one was using
something.

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState

2015-05-24 Thread Sturla Molden
On 24/05/15 20:04, Nathaniel Smith wrote:

 I'm not sure what you're envisioning as needing a deprecation cycle? The
 neat thing about random is that we already have a way for users to say
 that they want replicability -- the use of an explicit seed --

No, this is not sufficient for random numbers. Random sampling and 
ziggurat generators are examples. If we introduce a change (e.g. a 
bugfix) that will affect the number of calls to the entropy source, just 
setting the seed will in general not be enough to ensure backwards 
compatibility. That is e.g. the case with using ziggurat samplers 
instead of the current transcendental transforms for normal, exponential 
and gamma distributions. While ziggurat is faster (and to my knowledge) 
more accurate, it will also make a different number of calls to the 
entropy source, and hence the whole sequence will be affected, even if 
you do set a random seed.


Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState

2015-05-24 Thread Antony Lee
Thanks to Nathaniel who has indeed clarified my intent, i.e. the global
RandomState should use the latest implementation, unless explicitly
seeded.  More generally, the `RandomState` constructor is just a thin
wrapper around `seed` with the same signature, so one can swap the version
of the global functions with a call to `np.random.seed(version=...)`.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState

2015-05-24 Thread Antony Lee
2015-05-24 13:30 GMT-07:00 Sturla Molden sturla.mol...@gmail.com:

 On 24/05/15 10:22, Antony Lee wrote:

  Comments, and help for writing tests (in particular to make sure
  backwards compatibility is maintained) are welcome.

 I have one comment, and that is what makes random numbers so special?
 This applies to the rest of NumPy too, fixing a bug can sometimes change
 the output of a function.

 Personally I think we should only make guarantees about the data types,
 array shapes, and things like that, but not about the values. Those who
 need a particular version of NumPy for exact reproducibility should
 install the version of Python and NumPy they need. That is why virtual
 environments exist.


I personally agree with this point of view (see original discussion in
#5299, for example); if it was only up to me at least I'd make
RandomState(seed) default to the latest version rather than the original
one (whether to keep the old versions around is another question).  On the
other hand, I see that this long-standing debate has prevented obvious
improvements from being added sometimes for years (e.g. a patch for
Ziggurat normal variates has been lying around since 2010), or led to
potential API duplication in order to fix some clearly undesirable behavior
(dirichlet returning nan being described as in a strict sense not really
a bug(!)), so I'm willing to compromise to get this moving forward.

Antony
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState

2015-05-24 Thread Nathaniel Smith
On May 24, 2015 11:04 AM, josef.p...@gmail.com wrote:

 On Sun, May 24, 2015 at 1:49 PM, Nathaniel Smith n...@pobox.com wrote:

 On May 24, 2015 8:43 AM, josef.p...@gmail.com wrote:
 
  Reminder: we are bottom or inline posting

 Can we stop hassling people about this? Inline replies are a great tool
to have in your toolkit for complicated technical discussions, but I feel
like our weird insistence on them has turned into a pointless and
exclusionary thing. It's not like bottom replying is even any better -- the
traditional mailing list rule is you trim quotes to just the part you're
replying to (like this message); quoting the whole thing and replying
underneath just to give people a bit of exercise for their scrolling finger
would totally have gotten you flamed too.

 But email etiquette has moved on since the 90s, even regular posters to
this list violate this rule all the time, it's time to let it go.


 It's not a 90's thing and I learned about it around 2009 when I started
in here.
 I find it very annoying trying to catch up with a longer thread and the
replies are all over the place.


 Anne is a few years older than I in terms of numpy and scipy
participation and this was just intended to be a friendly reminder.

And while I know you didn't mean it this way, I'm guessing that being
immediately greeted by criticism for failing to follow some arbitrary and
inconsistently-applied rule was indeed a strong reminder of what a
unpleasant place FOSS mailing lists can sometimes be, and why someone might
disappear from them for a few years. I think we can do better.

This is pretty off-topic for this thread, though, see so let's let it lie
here. If anyone desperately needs to comment further please email me
off-list.

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState

2015-05-24 Thread Sturla Molden
On 24/05/15 10:22, Antony Lee wrote:

 Comments, and help for writing tests (in particular to make sure
 backwards compatibility is maintained) are welcome.

I have one comment, and that is what makes random numbers so special? 
This applies to the rest of NumPy too, fixing a bug can sometimes change 
the output of a function.

Personally I think we should only make guarantees about the data types, 
array shapes, and things like that, but not about the values. Those who 
need a particular version of NumPy for exact reproducibility should 
install the version of Python and NumPy they need. That is why virtual 
environments exist.

I am sure a lot will disagree with me on this. So please don't take this 
as flamebait.


Sturla



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState

2015-05-24 Thread josef.pktd
On Sun, May 24, 2015 at 5:09 PM, Antony Lee antony@berkeley.edu wrote:

 2015-05-24 13:30 GMT-07:00 Sturla Molden sturla.mol...@gmail.com:

 On 24/05/15 10:22, Antony Lee wrote:

  Comments, and help for writing tests (in particular to make sure
  backwards compatibility is maintained) are welcome.

 I have one comment, and that is what makes random numbers so special?
 This applies to the rest of NumPy too, fixing a bug can sometimes change
 the output of a function.

 Personally I think we should only make guarantees about the data types,
 array shapes, and things like that, but not about the values. Those who
 need a particular version of NumPy for exact reproducibility should
 install the version of Python and NumPy they need. That is why virtual
 environments exist.


 I personally agree with this point of view (see original discussion in
 #5299, for example); if it was only up to me at least I'd make
 RandomState(seed) default to the latest version rather than the original
 one (whether to keep the old versions around is another question).  On the
 other hand, I see that this long-standing debate has prevented obvious
 improvements from being added sometimes for years (e.g. a patch for
 Ziggurat normal variates has been lying around since 2010), or led to
 potential API duplication in order to fix some clearly undesirable behavior
 (dirichlet returning nan being described as in a strict sense not really
 a bug(!)), so I'm willing to compromise to get this moving forward.



It's clearly a different kind of bug than some of the ones we fixed in
the past without backwards compatibility discussion where the distribution
was wrong, i.e. some values shifted so parts have more weight and parts
have less weight.

As I mentioned, I don't see any real problem with the proposal.

Josef




 Antony

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState

2015-05-24 Thread Antony Lee
Hi,

As mentioned in

#1450: Patch with Ziggurat method for Normal distribution
#5158: ENH: More efficient algorithm for unweighted random choice without
replacement
#5299: using `random.choice` to sample integers in a large range
#5851: Bug in np.random.dirichlet for small alpha parameters

some methods on np.random.RandomState are implemented either non-optimally
(#1450, #5158, #5299) or have outright bugs (#5851), but cannot be easily
changed due to backwards compatibility concerns.  While some have suggested
new methods deprecating the old ones (see e.g. #5872), some consensus has
formed around the following ideas (see #5299 for original discussion,
followed by private discussions with @njsmith):

- Backwards compatibility should only be provided to those who were
explicitly instantiating a seeded RandomState object or reseeding a
RandomState object to a given value, and drawing variates from it: using
the global methods (or a None-seeded RandomState) was already
non-reproducible anyways as e.g. other libraries could be drawing variates
from the global RandomState (of which the free functions in np.random are
actually methods).  Thus, the global RandomState object should use the
latest implementation of the methods.

- RandomState(seed) and r = RandomState(...); r.seed(seed) should offer
backwards-compatibility guarantees (see e.g.
https://docs.python.org/3.4/library/random.html#notes-on-reproducibility).

As such, we propose the following improvements to the API:

- RandomState gains a (keyword-only) parameter, version, also accessible
as a read-only attribute.  This indicates the version of the methods on the
object.  The current version of RandomState is retroactively assigned
version 0.  The latest available version is available as
np.random.LATEST_VERSION.  Backwards-incompatible improvements to
RandomState methods can be introduced but increase the LAGTEST_VERSION.

- The global RandomState is instantiated as
RandomState(version=LATEST_VERSION).

- RandomState() and rs.seed() sets the version to LATEST_VERSION.

- RandomState(seed[!=None]) and rs.seed(seed[!=None]) sets the version to 0.

A proof-of-concept implementation, still missing tests, is tracked as
#5911.  It includes the patch proposed in #5158 as an example of how to
include an improved version of random.choice.

Comments, and help for writing tests (in particular to make sure backwards
compatibility is maintained) are welcome.

Antony Lee
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion