On Mon, Jan 14, 2013 at 11:22 AM, <josef.p...@gmail.com> wrote: > On Mon, Jan 14, 2013 at 11:15 AM, Olivier Delalleau <sh...@keba.be> wrote: >> 2013/1/14 Matthew Brett <matthew.br...@gmail.com>: >>> Hi, >>> >>> On Mon, Jan 14, 2013 at 9:02 AM, Dave Hirschfeld >>> <dave.hirschf...@gmail.com> wrote: >>>> Robert Kern <robert.kern <at> gmail.com> writes: >>>> >>>>> >>>>> >>> > >>>>> >>> > One alternative that does not expand the API with two-liners is to >>>>> >>> > let >>>>> >>> > the ndarray.fill() method return self: >>>>> >>> > >>>>> >>> > a = np.empty(...).fill(20.0) >>>>> >>> >>>>> >>> This violates the convention that in-place operations never return >>>>> >>> self, to avoid confusion with out-of-place operations. E.g. >>>>> >>> ndarray.resize() versus ndarray.reshape(), ndarray.sort() versus >>>>> >>> np.sort(), and in the broader Python world, list.sort() versus >>>>> >>> sorted(), list.reverse() versus reversed(). (This was an explicit >>>>> >>> reason given for list.sort to not return self, even.) >>>>> >>> >>>>> >>> Maybe enabling this idiom is a good enough reason to break the >>>>> >>> convention ("Special cases aren't special enough to break the rules. / >>>>> >>> Although practicality beats purity"), but it at least makes me -0 on >>>>> >>> this... >>>>> >>> >>>>> >> >>>>> >> I tend to agree with the notion that inplace operations shouldn't >>>>> >> return >>>>> >> self, but I don't know if it's just because I've been conditioned this >>>>> >> way. >>>>> >> Not returning self breaks the fluid interface pattern [1], as noted in >>>>> >> a >>>>> >> similar discussion on pandas [2], FWIW, though there's likely some way >>>>> >> to >>>>> >> have both worlds. >>>>> > >>>>> > Ah-hah, here's the email where Guide officially proclaims that there >>>>> > shall be no "fluent interface" nonsense applied to in-place operators >>>>> > in Python, because it hurts readability (at least for Dutch people >>>>> > ): >>>>> > http://mail.python.org/pipermail/python-dev/2003-October/038855.html >>>>> >>>>> That's a statement about the policy for the stdlib, and just one >>>>> person's opinion. You, and numpy, are permitted to have a different >>>>> opinion. >>>>> >>>>> In any case, I'm not strongly advocating for it. It's violation of >>>>> principle ("no fluent interfaces") is roughly in the same ballpark as >>>>> np.filled() ("not every two-liner needs its own function"), so I >>>>> thought I would toss it out there for consideration. >>>>> >>>>> -- >>>>> Robert Kern >>>>> >>>> >>>> FWIW I'm +1 on the idea. Perhaps because I just don't see many practical >>>> downsides to breaking the convention but I regularly see a big issue with >>>> there >>>> being no way to instantiate an array with a particular value. >>>> >>>> The one obvious way to do it is use ones and multiply by the value you >>>> want. I >>>> work with a lot of inexperienced programmers and I see this idiom all the >>>> time. >>>> It takes a fair amount of numpy knowledge to know that you should do it in >>>> two >>>> lines by using empty and setting a slice. >>>> >>>> In [1]: %timeit NaN*ones(10000) >>>> 1000 loops, best of 3: 1.74 ms per loop >>>> >>>> In [2]: %%timeit >>>> ...: x = empty(10000, dtype=float) >>>> ...: x[:] = NaN >>>> ...: >>>> 10000 loops, best of 3: 28 us per loop >>>> >>>> In [3]: 1.74e-3/28e-6 >>>> Out[3]: 62.142857142857146 >>>> >>>> >>>> Even when not in the mythical "tight loop" setting an array to one and then >>>> multiplying uses up a lot of cycles - it's nearly 2 orders of magnitude >>>> slower >>>> than what we know they *should* be doing. >>>> >>>> I'm agnostic as to whether fill should be modified or new functions >>>> provided but >>>> I think numpy is currently missing this functionality and that providing it >>>> would save a lot of new users from shooting themselves in the foot >>>> performance- >>>> wise. >>> >>> Is this a fair summary? >>> >>> => fill(shape, val), fill_like(arr, val) - new functions, as proposed >>> For: readable, seems to fit a pattern often used, presence in >>> namespace may clue people into using the 'fill' rather than * val or + >>> val >>> Con: a very simple alias for a = ones(shape) ; a.fill(val), maybe >>> cluttering already full namespace. >>> >>> => empty(shape).fill(val) - by allowing return value from arr.fill(val) >>> For: readable >>> Con: breaks guideline not to return anything from in-place operations, >>> no presence in namespace means users may not find this pattern. >>> >>> => no new API >>> For : easy maintenance >>> Con : harder for users to discover fill pattern, filling a new array >>> requires two lines instead of one. >>> >>> So maybe the decision rests on: >>> >>> How important is it that users see these function names in the >>> namespace in order to discover the pattern "a = ones(shape) ; >>> a.fill(val)"? >>> >>> How important is it to obey guidelines for no-return-from-in-place? >>> >>> How important is it to avoid expanding the namespace? >>> >>> How common is this pattern? >>> >>> On the last, I'd say that the only common use I have for this pattern >>> is to fill an array with NaN. >> >> My 2 cts from a user perspective: >> >> - +1 to have such a function. I usually use numpy.ones * scalar >> because honestly, spending two lines of code for such a basic >> operations seems like a waste. Even if it's slower and potentially >> dangerous due to casting rules. >> - I think having a noun rather than a verb makes more sense since we >> have numpy.ones and numpy.zeros (and I always read "numpy.empty" as >> "give me an empty array", not "empty an array"). >> - I agree the name collision with np.ma.filled is a problem. I have no >> better suggestion though at this point. > > np.array_filled(shape, value, dtype) ? > maybe more verbose, but unambiguous AFAICS > > BTW > GAUSS http://en.wikipedia.org/wiki/GAUSS_(software) > also has zeros and ones. 1st release 1984 > > np.array_filled((100, 2), -999, int) ?
A quick check of the statsmodels source 20 occassions of np.nan * np.ones(...) 50 occassions of np.emtpy a few filled with other values than nan many filled in a loop (optimistically, more often used by new contributers) It's just a two-liner, but if it's a function it hopefully produces better code. David's argument looks plausible to me. Josef > > Josef > > >> >> -=- Olivier >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion