Dear Mike, Thanks for the feedback. On Mon, Jun 28, 2010 at 12:51 PM, Michael Droettboom <md...@stsci.edu> wrote: > What are the implications of this with respect to memory usage? When > working with large arrays, if the intermediate values of a number of > functions are kept around (whether we want to access them or not) could > this not lead to excessive memory usage? Maybe this behavior should > only apply when (as you suggest in the counterexample) a "locals=True" > kwarg is passed in.
I've been thinking about it, but I haven't decided for a final implementation yet. I find it a bit messy to add a new kwarg to the signature of an existing function, as it might conflict with an existing *args argument. For example, redefining f(x, *args) as f(x, locals=True, *args) would break code calling f as f(1, 2, 3) . There are several alternatives: 1) add to the wrapping class a property to switch on and on the behavior of the decorator 2) introduce a naming convention (e.g., variables whose name begins with '_' are not saved) 3) have an option to dump the local variables to a file The solution I prefer so far is the second, but since I never had the problem in my code so far I'm not sure which option is most useful in practice. > > It seems like a lot of the maintainability issues raised in the > counterexample could be solved by returning a dictionary or a bunch [1] > instead of a tuple -- though that still (without care on the part of the > user) has the "keeping around references too much stuff" problem. > > [1] > http://code.activestate.com/recipes/52308-the-simple-but-handy-collector-of-a-bunch-of-named/ It's true that the counter-example is slightly unrealistic, although I have seen similar bits of code in real-life examples. Using a decorator is an advantage when dealing with code defined in a third-party library. Pietro > > Mike > > On 06/28/2010 12:35 PM, Pietro Berkes wrote: >> Dear everybody, >> >> This message belongs only marginally to a numpy-related mailing list, >> but I thought it might be of interest here since it addresses what I >> believe is a common pattern in scientific development. My apologies if >> that is not the case... >> >> The code can be found at http://github.com/pberkes/persistent_locals >> and requires the byteplay library >> (http://code.google.com/p/byteplay/). >> >> The problem >> ========= >> >> In scientific development, functions often represent complex data >> processing algorithm that transform input data into a desired output. >> Internally, the function typically requires several intermediate >> results to be computed and stored in local variables. >> >> As a simple toy example, consider the following function, that >> takes three arguments and returns True if the sum of the arguments is >> smaller than their product: >> >> def is_sum_lt_prod(a,b,c): >> sum = a+b+c >> prod = a*b*c >> return sum<prod >> >> A frequently occurring problem is that the developer/final user may >> need to access the intermediate results at a later stage, in order to >> analyze the detailed behavior of the algorithm, for debugging, or to >> write more comprehensive tests. >> >> A possible solution would be to re-define the function and return the >> needed internal variables, but this would break the existing code. A >> better solution would be to add a keyword argument to return more >> information: >> >> def is_sum_lt_prod(a,b,c, internals=False): >> sum = a+b+c >> prod = a*b*c >> if internals: >> return sum<prod, sum, prod >> else: >> return sum<prod >> >> This would keep the existing code intact, but only moves the problem >> to later stages of the development. If successively the developer >> needs access to even more local variables, the code has to be modified >> again, and part of the code is broken. Moreover, this style leads to >> ugly code like >> >> res, _, _, _, var1, _, var3 = f(x) >> >> where most of the returned values are irrelevant. >> >> Proposed solution >> ============= >> >> The proposed solution consists in a decorator that makes the local >> variables accessible from a function attribute, 'locals'. For example: >> >> @persistent_locals >> def is_sum_lt_prod(a,b,c): >> sum = a+b+c >> prod = a*b*c >> return sum<prod >> >> After calling the function (e.g. is_sum_lt_prod(2,1,2), which returns >> False) we can analyze the intermediate results as >> is_sum_lt_prod.locals >> -> {'a': 2, 'b': 1, 'c': 2, 'prod': 4, 'sum': 5} >> >> This style is cleaner, is consistent with the principle of identifying >> the value returned by a function as the output of an algorithm, and is >> robust to changes in the needs of the researcher. >> >> Note that the local variables are saved even in case of an exception, >> which turns out to be quite useful for debugging. >> >> How it works >> ========= >> >> The local variables in the inner scope of a function are not easily >> accessible. One solution (which I have not tried) may be to use >> tracing code like the one used in a debugger. This, however, would >> have a considerable cost in time. >> >> The proposed approach is to wrap the function in a callable object, >> and modify its bytecode by adding an external try...finally statement >> as follows: >> >> def f(self, *args, **kwargs): >> try: >> ... old code ... >> finally: >> self.locals = locals().copy() >> del self.locals['self'] >> >> The reason for wrapping the function in a class, instead of saving the >> locals in a function attribute directly, is that there are all sorts >> of complications in referring to itself from within a function. For >> example, referring to the attribute as f.locals results in the >> bytecode looking for the name 'f' in the namespace, and therefore >> moving the function, e.g. with >> g = f >> del f >> would break 'g'. There are even more problems for functions defined in >> a closure. >> >> I tried modfying f.func_globals with a custom dictionary which keeps a >> reference to f.func_globals, adding a static element to 'f', but this >> does not work as the Python interpreter does not call the func_globals >> dictionary with Python calls but directly with PyDict_GetItem (see >> http://osdir.com/ml/python.ideas/2007-11/msg00092.html). It is thus >> impossible to re-define __getitem__ to return 'f' as needed. Ideally, >> one would like to define a new closure for the function with a cell >> variable containing the reference, but this is impossible at present >> as far as I can tell. >> >> An alternative solution (see persistent_locals_with_kwarg in deco.py) >> is to change the signature of the function with an additional keyword >> argument f(arg1, arg2, _self_ref=f). However, this approach breaks >> functions that define an *args argument. >> >> Cost >> ==== >> The increase in execution time of the decorated function is minimal. >> Given its domain of application, most of the functions will take a >> significant amount of time to complete, making the cost the decoration >> negligible: >> >> import time >> def f(x): >> time.sleep(0.5) >> return 2*x >> >> df = deco.persistent_locals(f) >> >> %timeit f(1) >> 10 loops, best of 3: 501 ms per loop >> %timeit df(1) >> 10 loops, best of 3: 502 ms per loop >> >> Conclusion >> ======== >> >> The problem of accessing the intermediate >> results in an algorithm is a recurrent one in my research, and this >> decorator turned out to be quite useful in several occasions, and made >> some of the code much cleaner. Hopefully, it will be useful in other >> contexts as well! >> >> All the best, >> Pietro Berkes >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > -- > Michael Droettboom > Science Software Branch > Space Telescope Science Institute > Baltimore, Maryland, USA > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion