Re: [Numpy-discussion] Style for pad implementation in 'pad' namespace or functions under np.lib
On Mon, Apr 2, 2012 at 7:14 PM, Tim Cera t...@cerazone.net wrote: I think the suggestion is pad(a, 5, mode='mean'), which would be consistent with common numpy signatures. The mode keyword should probably have a default, something commonly used. I'd suggest 'mean', Nathaniel suggests 'zero', I think either would be fine. I can't type fast enough. :-) I should say that I can't type faster than Travis since he has already responded Currently that '5' in the example above is the keyword argument 'pad_width' which defaults to 1. So really the only argument then is 'a'? Everything else is keywords? I missed that in the discussion and I am not sure that it is a good idea. In fact as I am typing this I am thinking that we should have pad_width as an argument. I hate to rely on this, because it tends to get overused, but 'Explicit is better than implicit.' 'pad(a)' would carry a lot of implicit baggage that would mean it would be very difficult to figure out what was going on if reading someone else's code. Someone unfamiliar with the pad routine must consult the documentation to figure out what 'pad(a)' meant whereas pad(a, 'mean', 1), regardless of the order of the arguments, would actually read pretty well. I defer to a 'consensus' - whatever that might mean, but I am actually thinking that the input array, mode/method, and the pad_width should be arguments. The order of the arguments - I don't care. I realize that this thread is around 26 messages long now, but if everyone who is interested in this could weigh in one more time about this one issue. To minimize discussion on the list, you can add a comment to the pull request at https://github.com/numpy/numpy/pull/242 I guess I'll say def pad(arr, width, mode=constant, **kwargs): Or, if we don't want to have a default argument for mode (and maybe we don't -- my suggestion of giving it a default was partly based on the assumption that it was pretty obvious what the default should be!), then I'm indifferent between def pad(arr, width, mode, **kwargs): def pad(arr, mode, width, **kwargs): I definitely don't think width should have a default. -- Nathaniel ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Style for pad implementation in 'pad' namespace or functions under np.lib
The idea of using constants instead of strings throughout NumPy is an interesting one, but should be pushed to another thread and not hold up this particular PR. I like the suggestion of Nathaniel. Let's get the PR committed with a single-function interface. I like having the array as the first argument to that function (it is more consistent). They keyword can be called mode or method Tim, what do you think of that? Further developments can happen in a separate PR. -Travis On Mar 31, 2012, at 3:07 PM, Richard Hattersley wrote: 1) The use of string constants to identify NumPy processes. It would seem better to use library defined constants (ufuncs?) for better future-proofing, maintenance, etc. I don't see how this would help with future-proofing or maintenance -- can you elaborate? If this were C, I'd agree; using an enum would have a number of benefits: -- easier to work with than strings (== and switch work, no memory management hassles) -- compiler will notice if you accidentally misspell the enum name -- since you always in effect 'import *', getting access to additional constants doesn't require any extra effort But in Python none of these advantages apply, so I find it more convenient to just use strings. Using constants provides for tab-completion and associated help text. The help text can be particularly useful if the choice of constant affects which extra keyword arguments can be specified. And on a minor note, and far more subjectively (time for another bike-shedding reference!), there's the cleanliness of API. (e.g. Strings don't feel a good match. There are an infinite number of strings, but only a small number are valid. There's nothing machine-readable you can interrogate to find valid values.) Under the hood you'll have to use the string to do a lookup, but the constant can *be* the result of the lookup. Why re-invent the wheel when the language gives it to you for free? Note also that we couldn't use ufuncs here, because we're specifying a rather unusual sort of operation -- there is no ufunc for padding with a linear ramp etc. Using mean as the example is misleading in this respect -- it's not really the same as np.mean. 2) Why does only pad use this style of interface? If it's a good idea for pad, perhaps it should be applied more generally? numpy.aggregate(MEAN, ...), numpy.group(MEAN, ...), etc. anyone? The mode=foo interface style is actually used in other places, e.g., np.linalg.qr. My mistake - I misinterpreted the API earlier, so we're talking at cross-purposes. My comment/question isn't really about pad mode, but about numpy more generally. But it still stands - albeit somewhat hypothetically, since it's hard to imagine such a change taking place. Richard ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Style for pad implementation in 'pad' namespace or functions under np.lib
On Mon, Apr 2, 2012 at 12:09 PM, Travis Oliphant tra...@continuum.iowrote: The idea of using constants instead of strings throughout NumPy is an interesting one, but should be pushed to another thread and not hold up this particular PR. I like the suggestion of Nathaniel. Let's get the PR committed with a single-function interface. I like having the array as the first argument to that function (it is more consistent). They keyword can be called mode or method Tim, what do you think of that? Further developments can happen in a separate PR. Current pull request has a single pad function with the mode/method/whatever you call it as a string OR function as the first argument. pad('mean', a, 5) pad('median', a, 7) pad(paddingfunction, a, 2) ...etc. I like the strings, maybe that is not the best, but yes I would like to defer that discussion. Having the string representation does allow 'pad()' to make some checks on inputs to the built in functions. About whether to have pad('mean', a, 5) or pad(a, 'mean', 5) - I don't care. It seems like we have two votes for the later form (Travis and Nathaniel) and unless others weigh in, I will make the change soonish. Kindest regards, Tim ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Style for pad implementation in 'pad' namespace or functions under np.lib
On Mon, Apr 2, 2012 at 10:36 AM, Tim Cera t...@cerazone.net wrote: On Mon, Apr 2, 2012 at 12:09 PM, Travis Oliphant tra...@continuum.iowrote: The idea of using constants instead of strings throughout NumPy is an interesting one, but should be pushed to another thread and not hold up this particular PR. I like the suggestion of Nathaniel. Let's get the PR committed with a single-function interface. I like having the array as the first argument to that function (it is more consistent). They keyword can be called mode or method Tim, what do you think of that? Further developments can happen in a separate PR. Current pull request has a single pad function with the mode/method/whatever you call it as a string OR function as the first argument. pad('mean', a, 5) pad('median', a, 7) pad(paddingfunction, a, 2) ...etc. I like the strings, maybe that is not the best, but yes I would like to defer that discussion. Having the string representation does allow 'pad()' to make some checks on inputs to the built in functions. About whether to have pad('mean', a, 5) or pad(a, 'mean', 5) - I don't care. It seems like we have two votes for the later form (Travis and Nathaniel) and unless others weigh in, I will make the change soonish. I think the suggestion is pad(a, 5, mode='mean'), which would be consistent with common numpy signatures. The mode keyword should probably have a default, something commonly used. I'd suggest 'mean', Nathaniel suggests 'zero', I think either would be fine. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Style for pad implementation in 'pad' namespace or functions under np.lib
I like the strings, maybe that is not the best, but yes I would like to defer that discussion. Having the string representation does allow 'pad()' to make some checks on inputs to the built in functions. About whether to have pad('mean', a, 5) or pad(a, 'mean', 5) - I don't care. It seems like we have two votes for the later form (Travis and Nathaniel) and unless others weigh in, I will make the change soonish. I think the suggestion is pad(a, 5, mode='mean'), which would be consistent with common numpy signatures. The mode keyword should probably have a default, something commonly used. I'd suggest 'mean', Nathaniel suggests 'zero', I think either would be fine. It looks like most agree that pad(a, 5, mode='mean') is the preferred API. In terms of the default, it really depends on the problem as to which one is most sensible. I notice that mode='edge' does not require additional keyword arguments and so perhaps this is the best default.But, I think Tim should make the call on which default he prefers. -Travis Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Style for pad implementation in 'pad' namespace or functions under np.lib
I think the suggestion is pad(a, 5, mode='mean'), which would be consistent with common numpy signatures. The mode keyword should probably have a default, something commonly used. I'd suggest 'mean', Nathaniel suggests 'zero', I think either would be fine. I can't type fast enough. :-) I should say that I can't type faster than Travis since he has already responded Currently that '5' in the example above is the keyword argument 'pad_width' which defaults to 1. So really the only argument then is 'a'? Everything else is keywords? I missed that in the discussion and I am not sure that it is a good idea. In fact as I am typing this I am thinking that we should have pad_width as an argument. I hate to rely on this, because it tends to get overused, but 'Explicit is better than implicit.' 'pad(a)' would carry a lot of implicit baggage that would mean it would be very difficult to figure out what was going on if reading someone else's code. Someone unfamiliar with the pad routine must consult the documentation to figure out what 'pad(a)' meant whereas pad(a, 'mean', 1), regardless of the order of the arguments, would actually read pretty well. I defer to a 'consensus' - whatever that might mean, but I am actually thinking that the input array, mode/method, and the pad_width should be arguments. The order of the arguments - I don't care. I realize that this thread is around 26 messages long now, but if everyone who is interested in this could weigh in one more time about this one issue. To minimize discussion on the list, you can add a comment to the pull request at https://github.com/numpy/numpy/pull/242 Kindest regards, Tim ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Style for pad implementation in 'pad' namespace or functions under np.lib
On the one hand it is nice to be explicit. On the other hand it is nice to have keyword arguments. In this case it is very true that pad(a) would not be very clear. Most clear, though, would be: pad(a, width=5, mode='mean'). You could use keyword arguments with None as the default and raise an error if a correct value is not passed in. -Travis On Apr 2, 2012, at 1:14 PM, Tim Cera wrote: I think the suggestion is pad(a, 5, mode='mean'), which would be consistent with common numpy signatures. The mode keyword should probably have a default, something commonly used. I'd suggest 'mean', Nathaniel suggests 'zero', I think either would be fine. I can't type fast enough. :-) I should say that I can't type faster than Travis since he has already responded Currently that '5' in the example above is the keyword argument 'pad_width' which defaults to 1. So really the only argument then is 'a'? Everything else is keywords? I missed that in the discussion and I am not sure that it is a good idea. In fact as I am typing this I am thinking that we should have pad_width as an argument. I hate to rely on this, because it tends to get overused, but 'Explicit is better than implicit.' 'pad(a)' would carry a lot of implicit baggage that would mean it would be very difficult to figure out what was going on if reading someone else's code. Someone unfamiliar with the pad routine must consult the documentation to figure out what 'pad(a)' meant whereas pad(a, 'mean', 1), regardless of the order of the arguments, would actually read pretty well. I defer to a 'consensus' - whatever that might mean, but I am actually thinking that the input array, mode/method, and the pad_width should be arguments. The order of the arguments - I don't care. I realize that this thread is around 26 messages long now, but if everyone who is interested in this could weigh in one more time about this one issue. To minimize discussion on the list, you can add a comment to the pull request at https://github.com/numpy/numpy/pull/242 Kindest regards, Tim ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Style for pad implementation in 'pad' namespace or functions under np.lib
1) The use of string constants to identify NumPy processes. It would seem better to use library defined constants (ufuncs?) for better future-proofing, maintenance, etc. I don't see how this would help with future-proofing or maintenance -- can you elaborate? If this were C, I'd agree; using an enum would have a number of benefits: -- easier to work with than strings (== and switch work, no memory management hassles) -- compiler will notice if you accidentally misspell the enum name -- since you always in effect 'import *', getting access to additional constants doesn't require any extra effort But in Python none of these advantages apply, so I find it more convenient to just use strings. Using constants provides for tab-completion and associated help text. The help text can be particularly useful if the choice of constant affects which extra keyword arguments can be specified. And on a minor note, and far more subjectively (time for another bike-shedding reference!), there's the cleanliness of API. (e.g. Strings don't feel a good match. There are an infinite number of strings, but only a small number are valid. There's nothing machine-readable you can interrogate to find valid values.) Under the hood you'll have to use the string to do a lookup, but the constant can *be* the result of the lookup. Why re-invent the wheel when the language gives it to you for free? Note also that we couldn't use ufuncs here, because we're specifying a rather unusual sort of operation -- there is no ufunc for padding with a linear ramp etc. Using mean as the example is misleading in this respect -- it's not really the same as np.mean. 2) Why does only pad use this style of interface? If it's a good idea for pad, perhaps it should be applied more generally? numpy.aggregate(MEAN, ...), numpy.group(MEAN, ...), etc. anyone? The mode=foo interface style is actually used in other places, e.g., np.linalg.qr. My mistake - I misinterpreted the API earlier, so we're talking at cross-purposes. My comment/question isn't really about pad mode, but about numpy more generally. But it still stands - albeit somewhat hypothetically, since it's hard to imagine such a change taking place. Richard ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Style for pad implementation in 'pad' namespace or functions under np.lib
I like where this is going. Driven by a desire to avoid a million different methods on a single class, we've done something similar in our library. So instead of thing.mean() thing.max(...) etc. we have: thing.scrunch(MEAN, ...) thing.scrunch(MAX, ...) etc. Where the constants like MEAN and MAX encapsulate the process to be performed - including a reference to a NumPy/user-defined aggregation function, as well as some other transformation details. We then found we could reuse the same constants in other operations: thing.scrunch(MEAN, ...) thing.squish(MEAN, ...) thing.rolling_squish(MEAN, ...) So I have two minor concerns with the current proposal. 1) The use of string constants to identify NumPy processes. It would seem better to use library defined constants (ufuncs?) for better future-proofing, maintenance, etc. 2) Why does only pad use this style of interface? If it's a good idea for pad, perhaps it should be applied more generally? numpy.aggregate(MEAN, ...), numpy.group(MEAN, ...), etc. anyone? Richard Hattersley On 30 March 2012 02:55, Travis Oliphant tra...@continuum.io wrote: On Mar 29, 2012, at 12:53 PM, Tim Cera wrote: I was hoping pad would get finished some day. Maybe 1.9? You have been a great sport about this process. I think it will result in something quite nice. Alright - I do like the idea of passing a function to pad, with a bunch of pre-made functions in place. Maybe something like: a = np.arange(10) b = pad('mean', a, 2, stat_length=3) where if the first argument is a string, use one of the built in functions. If instead you passed in a function: def padwithzeros(vector, pad_width, iaxis, **kwargs): bvector = np.zeros(pad_width[0]) avector = np.zeros(pad_width[1]) return bvector, avector b = pad(padwithzeros, a, 2) Would that have some goodness? +1 -Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Style for pad implementation in 'pad' namespace or functions under np.lib
On Fri, Mar 30, 2012 at 8:35 AM, Richard Hattersley rhatters...@gmail.com wrote: I like where this is going. Driven by a desire to avoid a million different methods on a single class, we've done something similar in our library. So instead of thing.mean() thing.max(...) etc. we have: thing.scrunch(MEAN, ...) thing.scrunch(MAX, ...) etc. Where the constants like MEAN and MAX encapsulate the process to be performed - including a reference to a NumPy/user-defined aggregation function, as well as some other transformation details. We then found we could reuse the same constants in other operations: thing.scrunch(MEAN, ...) thing.squish(MEAN, ...) thing.rolling_squish(MEAN, ...) So I have two minor concerns with the current proposal. 1) The use of string constants to identify NumPy processes. It would seem better to use library defined constants (ufuncs?) for better future-proofing, maintenance, etc. I don't see how this would help with future-proofing or maintenance -- can you elaborate? If this were C, I'd agree; using an enum would have a number of benefits: -- easier to work with than strings (== and switch work, no memory management hassles) -- compiler will notice if you accidentally misspell the enum name -- since you always in effect 'import *', getting access to additional constants doesn't require any extra effort But in Python none of these advantages apply, so I find it more convenient to just use strings. Note also that we couldn't use ufuncs here, because we're specifying a rather unusual sort of operation -- there is no ufunc for padding with a linear ramp etc. Using mean as the example is misleading in this respect -- it's not really the same as np.mean. 2) Why does only pad use this style of interface? If it's a good idea for pad, perhaps it should be applied more generally? numpy.aggregate(MEAN, ...), numpy.group(MEAN, ...), etc. anyone? The mode=foo interface style is actually used in other places, e.g., np.linalg.qr. -- Nathaniel ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Style for pad implementation in 'pad' namespace or functions under np.lib
On Thu, Mar 29, 2012 at 6:53 PM, Tim Cera t...@cerazone.net wrote: If instead you passed in a function: def padwithzeros(vector, pad_width, iaxis, **kwargs): bvector = np.zeros(pad_width[0]) avector = np.zeros(pad_width[1]) return bvector, avector b = pad(padwithzeros, a, 2) Would that have some goodness? I like the idea, but this interface feels undercooked to me. What is iaxis? (I couldn't figure that out from the docstrings in the pull request either.) If padding a matrix, do we need a way to do padding of all rows simultaneously? (Certainly that'd be just as easy for something like padwithzeros, and would let us avoid a python-level for loop.) Why is this function allocating new arrays that will just be copied into the big array and then discarded, instead of filling in the big array directly? (Again, this is a speed issue.) If it's working with single rows at a time, then how will padwithconstant know which constant to pull out of kwargs[constant_value] when it's an array? That's why I thought you might want to submit this as a second pull request, rather than letting it hold up the whole thing. -- Nathaniel ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Style for pad implementation in 'pad' namespace or functions under np.lib
On Fri, Mar 30, 2012 at 5:41 AM, Nathaniel Smith n...@pobox.com wrote: On Thu, Mar 29, 2012 at 6:53 PM, Tim Cera t...@cerazone.net wrote: If instead you passed in a function: def padwithzeros(vector, pad_width, iaxis, **kwargs): bvector = np.zeros(pad_width[0]) avector = np.zeros(pad_width[1]) return bvector, avector b = pad(padwithzeros, a, 2) Would that have some goodness? I like the idea, but this interface feels undercooked to me. What is iaxis? (I couldn't figure that out from the docstrings in the pull request either.) If padding a matrix, do we need a way to do padding of all rows simultaneously? (Certainly that'd be just as easy for something like padwithzeros, and would let us avoid a python-level for loop.) Why is this function allocating new arrays that will just be copied into the big array and then discarded, instead of filling in the big array directly? (Again, this is a speed issue.) If it's working with single rows at a time, then how will padwithconstant know which constant to pull out of kwargs[constant_value] when it's an array? That's why I thought you might want to submit this as a second pull request, rather than letting it hold up the whole thing. Would you suggest committing the current PR and then adding this interface later? Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Style for pad implementation in 'pad' namespace or functions under np.lib
On Fri, Mar 30, 2012 at 1:22 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Fri, Mar 30, 2012 at 5:41 AM, Nathaniel Smith n...@pobox.com wrote: On Thu, Mar 29, 2012 at 6:53 PM, Tim Cera t...@cerazone.net wrote: If instead you passed in a function: def padwithzeros(vector, pad_width, iaxis, **kwargs): bvector = np.zeros(pad_width[0]) avector = np.zeros(pad_width[1]) return bvector, avector b = pad(padwithzeros, a, 2) Would that have some goodness? I like the idea, but this interface feels undercooked to me. What is iaxis? (I couldn't figure that out from the docstrings in the pull request either.) If padding a matrix, do we need a way to do padding of all rows simultaneously? (Certainly that'd be just as easy for something like padwithzeros, and would let us avoid a python-level for loop.) Why is this function allocating new arrays that will just be copied into the big array and then discarded, instead of filling in the big array directly? (Again, this is a speed issue.) If it's working with single rows at a time, then how will padwithconstant know which constant to pull out of kwargs[constant_value] when it's an array? That's why I thought you might want to submit this as a second pull request, rather than letting it hold up the whole thing. Would you suggest committing the current PR and then adding this interface later? My suggestion is: Step 1: Change the current PR so that it has only one user-exposed function, something like pad(..., mode=foo), and commit that. Everyone seems to pretty much like that interface, implementing it would take 1 hour of work, and then the basic functionality would be landed and done. Step 2: Add the option to pass a user-defined function as the mode= argument, since there's still uncertainty about the best way to do it and working through uncertainty adds time and risk that shouldn't hold up the parts that we do agree on. Even if we do want to keep around the pad_with_mean, pad_with_median etc. functions as additional user-exposed entry-points, I think the current names in the PR met with objections? (The current names are like np.lib.pad.pad_mean.) -- Nathaniel ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Style for pad implementation in 'pad' namespace or functions under np.lib
I rearranged your questions. Why is this function allocating new arrays that will just be copied into the big array and then discarded, instead of filling in the big array directly? (Again, this is a speed issue.) My example in the e-mail was incorrect (sorry about that). The way it actually has always worked is exactly what you describe. If padding a matrix, do we need a way to do padding of all rows simultaneously? (Certainly that'd be just as easy for something like padwithzeros, and would let us avoid a python-level for loop.) The core of pad is apply_along_axis which passes a vector to a function. apply_along_axis will pass each 'row' into a function, then I use apply_along_axis again for each 'column' (which now includes padded values from each 'row') progressing through the axes. The function just works on a vector (rank 1 array). The vector that the function sees has to already have place holder pad values. That is why the function also needs the pad width to know which values to manipulate. So a working thought untested function would be something like: def padwithzeros(vector, pad_width, iaxis, **kwargs): vector[:pad_tuple[0]] = 0 vector[-pad_tuple[1]:] = 0 return vector b = pad(padwithzeros, a, 2) Note that iaxis isn't used! But wait... I like the idea, but this interface feels undercooked to me. What is iaxis? (I couldn't figure that out from the docstrings in the pull request either.) If it's working with single rows at a time, then how will padwithconstant know which constant to pull out of kwargs[constant_value] when it's an array? These questions actually go together. First, keywords like constant_values are the same for all rows in the axis - they are defined for the entire axis. Now iaxis. It is only needed inside the function to use the correct constant_values, end_values, or stat_length. For example something like... def padwithconstants(vector, pad_width, iaxis, **kwargs): vector[:pad_tuple[0]] = kwargs['constant_values'][iaxis][0] vector[-pad_tuple[1]:] = kwargs['constant_values'][iaxis][1] return vector constants = ((10,20), (30,40)) # 10 will be prepended to all rows (axis 0) # 20 will be appended to all rows (axis 0) # 30 b = pad(padwithconstants, a, 2, constant_values=constants) Kindest regards, Tim ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Style for pad implementation in 'pad' namespace or functions under np.lib
My suggestion is: Step 1: Change the current PR so that it has only one user-exposed function, something like pad(..., mode=foo), and commit that. Everyone seems to pretty much like that interface, implementing it would take 1 hour of work, and then the basic functionality would be landed and done. This is all done in my working directory. Currently I have 'mode' as the first argument and not a keyword. Could you explain the utility of having it be a keyword, if that is indeed what you were advocating earlier? Step 2: Add the option to pass a user-defined function as the mode= argument, since there's still uncertainty about the best way to do it and working through uncertainty adds time and risk that shouldn't hold up the parts that we do agree on. This is done also. I don't do any checks. If it isn't a string, then I take it to be a function. The function signature is: myfunc(vector, pad_tuple, iaxis, kwds) and it has to return a rank 1 array the same length as the input `vector` Even if we do want to keep around the pad_with_mean, pad_with_median etc. functions as additional user-exposed entry-points, I think the current names in the PR met with objections? (The current names are like np.lib.pad.pad_mean.) Names? Kindest regards, Tim ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Style for pad implementation in 'pad' namespace or functions under np.lib
On Fri, Mar 30, 2012 at 2:20 PM, Tim Cera t...@cerazone.net wrote: My suggestion is: Step 1: Change the current PR so that it has only one user-exposed function, something like pad(..., mode=foo), and commit that. Everyone seems to pretty much like that interface, implementing it would take 1 hour of work, and then the basic functionality would be landed and done. This is all done in my working directory. Cool. Currently I have 'mode' as the first argument and not a keyword. Could you explain the utility of having it be a keyword, if that is indeed what you were advocating earlier? Eh, just that it feels vaguely more consistent with all the other np functions, which generally take arrays as their first argument and sometimes have a mode=string argument. Examples of the latter: np.linalg.qr, np.correlate, np.convolve, np.take, np.put, np.choose And unlike Charles, my first guess at the default would have been zero-padding (isn't that the most common sort of padding?). But both of these are minor quibbles -- bikeshedding, really. My opinion isn't strong. Step 2: Add the option to pass a user-defined function as the mode= argument, since there's still uncertainty about the best way to do it and working through uncertainty adds time and risk that shouldn't hold up the parts that we do agree on. This is done also. I don't do any checks. If it isn't a string, then I take it to be a function. Yes, that's probably the Right Way in python. The function signature is: myfunc(vector, pad_tuple, iaxis, kwds) and it has to return a rank 1 array the same length as the input `vector` From your other mail, this API does make more sense to me now. However, looking at the code, I think the returned vector is ignored, and ideally should be [] so as to prevent apply_along_axis from generating a large temporary matrix that we will in any case discard? Even if we do want to keep around the pad_with_mean, pad_with_median etc. functions as additional user-exposed entry-points, I think the current names in the PR met with objections? (The current names are like np.lib.pad.pad_mean.) Names? I wasn't sure if Charles was talking about merging the current pull request as-is, so I was re-raising the original question you started the thread with: whether we like having the functions referred to like np.lib.pad.pad_mean, or something else would be better (e.g. np.lib.pad_with_mean). If you're changing the PR anyway then you can just ignore that paragraph :-). Cheers, -- Nathaniel ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Style for pad implementation in 'pad' namespace or functions under np.lib
On Wed, Mar 28, 2012 at 6:08 PM, Charles R Harris charlesr.har...@gmail.com wrote: I think there is also a question of using a prefix pad_xxx for the function names as opposed to pad.xxx. If I had it as pad.mean, pad.median, ...etc. then someone could from numpy.pad import * a = np.arange(5) # then you would have a confusing b = mean(a, 2) Because of that I would vote for 'pad.pad_xxx' instead of 'pad.xxx'. In fact at one time I named the functions 'pad.pad_with_xxx' which was a little overkill. Kindest regards, Tim ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Style for pad implementation in 'pad' namespace or functions under np.lib
On Thu, Mar 29, 2012 at 7:55 AM, Tim Cera I think there is also a question of using a prefix pad_xxx for the function names as opposed to pad.xxx. Namespaces are one honking great idea -- let's do more of those! If I had it as pad.mean, pad.median, ...etc. then someone could from numpy.pad import * a = np.arange(5) # then you would have a confusing b = mean(a, 2) exactly why no one should *ever* do import *! pad.mean is not more typing than pad_mean so there in NO reason to use the latter, except to have a C style, rather than python namespaces ANd what if someone DOES want to override mean?: from np.pad import mean and now they can do mean(something) and it's explicite what they mean to do. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Style for pad implementation in 'pad' namespace or functions under np.lib
While namespaces are a really good idea, I'm not a big fan of both module namespaces and underscore namespaces. It seems pretty redundant to me to have pad.pad_mean. On the other hand, one could argue that pad.mean could be confused with calculating the mean of a padded array. So, it seems like the function names need to be called something more than just mean, median, etc. Something like padwith_mean, padwith_median, etc. actually makes more sense. Or pad.with_mean, pad.with_median. The with_ in this case is not really a namespace it's an indication of functionality. With that said, NumPy was designed not to be deep in terms of naming.We should work hard to ensure that functions and Classes to be instantiated by the user are no more than 2 levels down (i.e. either in the NumPy namespace or in the numpy.subpackage namespace. So, either we need to move pad into a new subpackage and call things pad.with_mean pad.with_median etc. or keep the functions pad_mean, pad_median, etc. in numpy.lib I don't think this functionality really justifies an entire new sub-package in NumPy, though. So, I would vote for keeping things accessible as numpy.lib.pad_mean -Travis 1) The functions in pad.py be imported into the lib namespace 2) The functions all be renamed to padwith_mean On Mar 29, 2012, at 9:55 AM, Tim Cera wrote: On Wed, Mar 28, 2012 at 6:08 PM, Charles R Harris charlesr.har...@gmail.com wrote: I think there is also a question of using a prefix pad_xxx for the function names as opposed to pad.xxx. If I had it as pad.mean, pad.median, ...etc. then someone could from numpy.pad import * a = np.arange(5) # then you would have a confusing b = mean(a, 2) Because of that I would vote for 'pad.pad_xxx' instead of 'pad.xxx'. In fact at one time I named the functions 'pad.pad_with_xxx' which was a little overkill. Kindest regards, Tim ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Style for pad implementation in 'pad' namespace or functions under np.lib
On Thu, Mar 29, 2012 at 5:13 PM, Travis Oliphant tra...@continuum.io wrote: While namespaces are a really good idea, I'm not a big fan of both module namespaces and underscore namespaces. It seems pretty redundant to me to have pad.pad_mean. On the other hand, one could argue that pad.mean could be confused with calculating the mean of a padded array. So, it seems like the function names need to be called something more than just mean, median, etc. Something like padwith_mean, padwith_median, etc. actually makes more sense. Or pad.with_mean, pad.with_median. The with_ in this case is not really a namespace it's an indication of functionality. Perhaps it should be pad(..., mode=mean) , mode= is only a few more characters than _with_, and this would make it much easier to write functions whose API looks like: wavelet_decompose(..., pad_mode=) Also it would solve the namespace question :-). -- Nathaniel ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Style for pad implementation in 'pad' namespace or functions under np.lib
On Thu, Mar 29, 2012 at 10:38 AM, Nathaniel Smith n...@pobox.com wrote: On Thu, Mar 29, 2012 at 5:13 PM, Travis Oliphant tra...@continuum.io wrote: While namespaces are a really good idea, I'm not a big fan of both module namespaces and underscore namespaces. It seems pretty redundant to me to have pad.pad_mean. On the other hand, one could argue that pad.mean could be confused with calculating the mean of a padded array. So, it seems like the function names need to be called something more than just mean, median, etc. Something like padwith_mean, padwith_median, etc. actually makes more sense. Or pad.with_mean, pad.with_median. The with_ in this case is not really a namespace it's an indication of functionality. Perhaps it should be pad(..., mode=mean) , mode= is only a few more characters than _with_, and this would make it much easier to write functions whose API looks like: wavelet_decompose(..., pad_mode=) Also it would solve the namespace question :-). I like this idea. I'd leave the current functions in the module for those who want to import a particular padding type, but import the pad(..., mode=xxx) into the numpy namespace. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Style for pad implementation in 'pad' namespace or functions under np.lib
On Thu, Mar 29, 2012 at 5:38 PM, Nathaniel Smith n...@pobox.com wrote: On Thu, Mar 29, 2012 at 5:13 PM, Travis Oliphant tra...@continuum.io wrote: While namespaces are a really good idea, I'm not a big fan of both module namespaces and underscore namespaces. It seems pretty redundant to me to have pad.pad_mean. On the other hand, one could argue that pad.mean could be confused with calculating the mean of a padded array. So, it seems like the function names need to be called something more than just mean, median, etc. Something like padwith_mean, padwith_median, etc. actually makes more sense. Or pad.with_mean, pad.with_median. The with_ in this case is not really a namespace it's an indication of functionality. Perhaps it should be pad(..., mode=mean) , mode= is only a few more characters than _with_, and this would make it much easier to write functions whose API looks like: wavelet_decompose(..., pad_mode=) Also it would solve the namespace question :-). -- Nathaniel ...And on looking at the pull request for a bit, it would let us combine 10 distinct single-line functions that (with highly redundant docstrings) take up ~600 lines of code. The interfaces are *almost* identical; the exceptions are: -- reflect and symmetric take and even/odd argument -- pad_constant takes an argument specifying the constant to pad with -- pad_linear_ramp takes an argument specifying the end value of the linear ramp The first two can be easily handled by just combining it into the mode, so we have reflect, reflect_odd, symmetric, symmetric_odd, so that just leaves two mode-specific arguments. I wouldn't feel bad about leaving those in as unused arguments for most modes. We might even want to combine them into a single argument -- normally I'm against such simplifications, but the semantics really are quite similar. And final advantage: we could later extend this interface to also support mode=user_specified_callable for custom padding modes, since this is basically how the underlying code already works. (Though I'm not quite sure the interface for the individual pad mode functions like _mean and _median is the one we'd want to expose to users.) -- Nathaniel ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Style for pad implementation in 'pad' namespace or functions under np.lib
On Thu, Mar 29, 2012 at 5:45 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Thu, Mar 29, 2012 at 10:38 AM, Nathaniel Smith n...@pobox.com wrote: On Thu, Mar 29, 2012 at 5:13 PM, Travis Oliphant tra...@continuum.io wrote: While namespaces are a really good idea, I'm not a big fan of both module namespaces and underscore namespaces. It seems pretty redundant to me to have pad.pad_mean. On the other hand, one could argue that pad.mean could be confused with calculating the mean of a padded array. So, it seems like the function names need to be called something more than just mean, median, etc. Something like padwith_mean, padwith_median, etc. actually makes more sense. Or pad.with_mean, pad.with_median. The with_ in this case is not really a namespace it's an indication of functionality. Perhaps it should be pad(..., mode=mean) , mode= is only a few more characters than _with_, and this would make it much easier to write functions whose API looks like: wavelet_decompose(..., pad_mode=) Also it would solve the namespace question :-). I like this idea. I'd leave the current functions in the module for those who want to import a particular padding type, but import the pad(..., mode=xxx) into the numpy namespace. Your call, but that's about 500 lines of highly-redundant docstrings to maintain, just to save the occasional person from having to type one line: def pad_with_mean(*args, **kwargs): np.lib.pad(*args, mode=mean, **kwargs) The unified implementation really is just PADDERS = {mean: _mean, median: _median, ...} return _loop_across(array, pad_width, _PADDERS[mode], constant_values=constant_values, end_values=end_values) -- Nathaniel ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Style for pad implementation in 'pad' namespace or functions under np.lib
On Mar 29, 2012, at 11:52 AM, Nathaniel Smith wrote: On Thu, Mar 29, 2012 at 5:38 PM, Nathaniel Smith n...@pobox.com wrote: On Thu, Mar 29, 2012 at 5:13 PM, Travis Oliphant tra...@continuum.io wrote: While namespaces are a really good idea, I'm not a big fan of both module namespaces and underscore namespaces. It seems pretty redundant to me to have pad.pad_mean. On the other hand, one could argue that pad.mean could be confused with calculating the mean of a padded array. So, it seems like the function names need to be called something more than just mean, median, etc. Something like padwith_mean, padwith_median, etc. actually makes more sense. Or pad.with_mean, pad.with_median. The with_ in this case is not really a namespace it's an indication of functionality. Perhaps it should be pad(..., mode=mean) , mode= is only a few more characters than _with_, and this would make it much easier to write functions whose API looks like: wavelet_decompose(..., pad_mode=) Also it would solve the namespace question :-). -- Nathaniel ...And on looking at the pull request for a bit, it would let us combine 10 distinct single-line functions that (with highly redundant docstrings) take up ~600 lines of code. The interfaces are *almost* identical; the exceptions are: -- reflect and symmetric take and even/odd argument -- pad_constant takes an argument specifying the constant to pad with -- pad_linear_ramp takes an argument specifying the end value of the linear ramp The first two can be easily handled by just combining it into the mode, so we have reflect, reflect_odd, symmetric, symmetric_odd, so that just leaves two mode-specific arguments. I wouldn't feel bad about leaving those in as unused arguments for most modes. We might even want to combine them into a single argument -- normally I'm against such simplifications, but the semantics really are quite similar. And final advantage: we could later extend this interface to also support mode=user_specified_callable for custom padding modes, since this is basically how the underlying code already works. (Though I'm not quite sure the interface for the individual pad mode functions like _mean and _median is the one we'd want to expose to users.) I like the direction this conversation is going. A single pad function in numpy or numpy.lib is a good idea. -Travis -- Nathaniel ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Style for pad implementation in 'pad' namespace or functions under np.lib
I was hoping pad would get finished some day. Maybe 1.9? Alright - I do like the idea of passing a function to pad, with a bunch of pre-made functions in place. Maybe something like: a = np.arange(10) b = pad('mean', a, 2, stat_length=3) where if the first argument is a string, use one of the built in functions. If instead you passed in a function: def padwithzeros(vector, pad_width, iaxis, **kwargs): bvector = np.zeros(pad_width[0]) avector = np.zeros(pad_width[1]) return bvector, avector b = pad(padwithzeros, a, 2) Would that have some goodness? Kindest regards, Tim ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Style for pad implementation in 'pad' namespace or functions under np.lib
On Thu, Mar 29, 2012 at 11:53 AM, Tim Cera t...@cerazone.net wrote: I was hoping pad would get finished some day. Maybe 1.9? Well, you *did* ask ;-) Alright - I do like the idea of passing a function to pad, with a bunch of pre-made functions in place. Maybe something like: a = np.arange(10) b = pad('mean', a, 2, stat_length=3) where if the first argument is a string, use one of the built in functions. I like Nathaniel's idea of making the string an optional argument, that way you can also give it a default value ('mean' ?). If instead you passed in a function: def padwithzeros(vector, pad_width, iaxis, **kwargs): bvector = np.zeros(pad_width[0]) avector = np.zeros(pad_width[1]) return bvector, avector b = pad(padwithzeros, a, 2) Would that have some goodness? Yeah, I think we're converging on something nice. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Style for pad implementation in 'pad' namespace or functions under np.lib
On Mar 29, 2012, at 12:53 PM, Tim Cera wrote: I was hoping pad would get finished some day. Maybe 1.9? You have been a great sport about this process. I think it will result in something quite nice. Alright - I do like the idea of passing a function to pad, with a bunch of pre-made functions in place. Maybe something like: a = np.arange(10) b = pad('mean', a, 2, stat_length=3) where if the first argument is a string, use one of the built in functions. If instead you passed in a function: def padwithzeros(vector, pad_width, iaxis, **kwargs): bvector = np.zeros(pad_width[0]) avector = np.zeros(pad_width[1]) return bvector, avector b = pad(padwithzeros, a, 2) Would that have some goodness? +1 -Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Style for pad implementation in 'pad' namespace or functions under np.lib
On Wed, Mar 28, 2012 at 9:31 PM, Tim Cera t...@cerazone.net wrote: I have been developing a set of pad functions to pad arrays in different ways. Really close to having it accepted into numpy, but I want to revisit an implementation issue that I have become worried about. Should these functions be collected into a 'pad' namespace or put raw into np.lib? Do I really care about this? Not really since it isn't a utility issue, but I contend that it would look better to pull these functions into their own namespace. +1 one for a separate 'pad' or 'padding' namespace. Adding 10 functions to the main numpy namespace feels like too much. Ralf Is there a consensus? The current pull request: https://github.com/numpy/numpy/pull/242 Kindest regards, Tim ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Style for pad implementation in 'pad' namespace or functions under np.lib
On Wed, Mar 28, 2012 at 1:57 PM, Ralf Gommers ralf.gomm...@googlemail.comwrote: On Wed, Mar 28, 2012 at 9:31 PM, Tim Cera t...@cerazone.net wrote: I have been developing a set of pad functions to pad arrays in different ways. Really close to having it accepted into numpy, but I want to revisit an implementation issue that I have become worried about. Should these functions be collected into a 'pad' namespace or put raw into np.lib? Do I really care about this? Not really since it isn't a utility issue, but I contend that it would look better to pull these functions into their own namespace. +1 one for a separate 'pad' or 'padding' namespace. Adding 10 functions to the main numpy namespace feels like too much. I think there is also a question of using a prefix pad_xxx for the function names as opposed to pad.xxx. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion