> > For example, I wrote a library routine for doing log-linear > regression. Doing this required computing the derivative of the > likelihood function, which was a huge nitpicky hassle; took me a few > hours to work out and debug. But it's still just 10 lines of Python > code that I needed to figure out once and they're done forever, now. > I'd have been perfectly happy if I could have gotten those ten lines > by asking a random unreleased library I pulled off github, which > depended on heavy libraries like Theano and relied on a mostly > untested emulator for some particular version of the CPython VM. But > I'd be less happy to ask everyone who uses my code to install that > library as well, just so I could avoid having to spend a few hours > doing math. This isn't a criticism or your library or anything, it's > just that I'm always going to be reluctant to rely on an automatic > differentiation tool that takes arbitrary code as input, because it > almost certainly cannot be made fully robust. So it'd be nice to have > the option to stick a human in the loop.
Log-linears are by definition too simple a model to appreciate auto-differentiation. Try computing the Hessian by hand on a modestly sized multilayer neural network and you will start seeing the advantages. Or say computing the Hessian of a large graphical model. But I do have my own reservations about auto-diff. Until we have the smart enough compiler that does common subexpression elimination, and in fact even then, hand written differentiation code will often turn out to be more efficient. Terms cancel out (subtraction or division), terms factorize, terms can be arranged into an efficient Horner's scheme. It will take a very smart symbolic manipulation of the parse tree to get all that. So places where I really need to optimize the derivative code, I would still do it by hand and delegate it to an AD system when the size gets unwieldy. In theory a good compromise is to let the AD churn out the code and then hand optimize it. But here readable output indeed does help. As far as correctness of the computed derivative is concerned, computing the dot product between the gradient of a function and the secant computed numerically from the function does guard against gross errors. If i remember correctly the scipy module on optimization already has a function to do such sanity checks. Of course it cannot guarantee correctness, but usually goes a long way. -- srean _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion