On Tue, May 3, 2011 at 6:58 PM, Michael Katz <[email protected]>wrote:
> I have a basic question about applying functions to array elements. There > is a rambling introduction here and then I get to the ACTUAL QUESTION about > 2/3 of the way down. > > RAMBLING INTRODUCTION > > I was trying to do a simple mapping of array values to other values. So I > had: > > unmapped_colors = np.array((0,0,1,1,2,0,0,1)) > BNA_LAND_FEATURE_CODE = 0 > BNA_WATER_FEATURE_CODE = 1 > # color_to_int() produces some unsigned int value > green = color_to_int( 0.25, 0.5, 0, 0.75 ) > blue = color_to_int( 0.0, 0.0, 0.5, 0.75 ) > gray = color_to_int( 0.5, 0.5, 0.5, 0.75 ) > def map_type_to_color( t ): > if ( t == BNA_LAND_FEATURE_CODE ): > return green > elif ( t == BNA_WATER_FEATURE_CODE ): > return blue > else: > return gray > mapped_colors = np.vectorize( map_type_to_color )( > unmapped_colors ) > > I found that by using vectorize I got about a 3x speedup compared to using > a python for loop to loop through the array. One small question is I wonder > where that speedup comes from; i.e., what vectorize is actually doing for > me. Is it just the time for the actual looping logic? That seems like too > much speedup. Does it also include a faster access of the array data items? > > So I was trying to find a "pure numpy" solution for this. I then learned > about fancy indexing and boolean indexing, and saw that I could do boolean > array version: > > mapped_colors = np.zeros(unmapped_colors.shape, dtype=np.uint32) + gray > > mapped_colors[unmapped_colors==BNA_LAND_FEATURE_CODE] = green > mapped_colors[unmapped_colors==BNA_WATER_FEATURE_CODE] = blue > > and in fact I could do "fancy indexing" version: > > color_array = np.array((green, blue, gray), dtype=np.uint32) > colors = color_array[unmapped_colors] > > The boolean array version is pretty simple, but makes three "passes" over > the data. > > The fancy indexing version is simple and one pass, but it relies on the > fact that my constant values happen to be nice to use as array indexes. And > in fact to use it in actual code I'd need to do one or more other passes to > check unmapped_colors for any indexes < 0 or > 2. > > ACTUAL QUESTION > > So, the actual question here is, even though I was able to find a couple > solutions for this particular problem, I am wondering in general about > applying arbitrary functions to each array element in a "pure numpy" way, > ideally using just a single pass. > > It seems to me like there should be a some way to build up a "numpy > compiled" function that you could then apply to all elements in an array. Is > that what "universal functions" are about? That's what it sounded like it > was about to me at first, but then after looking closer it didn't seem like > that. > > I guess I was hoping vectorize was some magical thing that could compile > certain python expressions down into "numpy primitives". It seems like as > long as you stuck to a limited set of math and logical operations on arrays, > even if those expressions included calls to subroutines that were similarly > restricted, there should be an automated way to convert such expressions > into the equivalent of a numpy built-in. I guess it's similar to cython, but > it wouldn't require a separate file or special compilation. I could just say > > def capped_sqrt( n ): > if ( x > 100 ) > return 10 > else: > return sqrt( n ) > f = numpy.compile( lambda(x): 0 if ( x < 10 ) else capped_sqrt( x ) ) > numpy.map( f, a ) > > or something like that, and it would all happen in a single pass within > numpy, with no "python code" being run. > > Is that something that exists? > > I realize that for any function like this I could build it out of a > sequence of individual numpy operations on the whole array, but it seems > that in general that requires doing several passes, creating multiple temp > arrays, and computing functions for elements where the results will later be > thrown out. This might be just because I'm a numpy newbie, but it's not > clear to me how to do the example above on an array of size N without > computing N square roots, even if the number of elements in my array between > 10 and 100 is much smaller than N . Granted, I could set things up so that > by the time sqrt was applied it would only be computing, say, sqrt( 0 ) for > all those values that were originally < 10 or > 100. But still, it would be > nice not to compute unnecessarily at all. And efficiency concerns aside, it > just seems unfortunate, language-wise, to always have to think cleverly > about some sequence of whole-array transformations instead of just being > able to think about the "single pass computation" that is in your head (at > least is in my head) in the first place and is after all the most direct > and non-redundant way to get the result. > > Michael > > I can't comment much on the questions you raise, but I can certainly suggest that np.where() function might be a much better choice than some of the if-statements. Ben Root
_______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
