So, I'd like to capture a backtrace whenever we make a SQL query or HTTP service call.
I think this should be ok, because we don't *want* thousands of queries or thousands of backend service calls. We want efficient batch based operations, because doing a query or making an HTTP call is at least 1ms itself. The extra data this would generate would help us more quickly identify the call paths leading to horrendous query counts. I've done some rough profiling - see the attached file for methodology. It looks like grabbing and stringifying a 20 frame deep backtrace on amd64 is about 1ms: doing 1000 takes 1 second on my laptop. So, doing this naively would add 1 second to a page that does 1000 queries; and 100ms to a page that does 100 queries. Given that a page doing 1000 queries is almost certainly failing *now*, I think doing this naively is probably ok. However, we could do some things to limit the impact. I'm trying to decide if they are worth doing. Idea 1: Backtrace on the first 50 queries. There should be some repetition in there, and after that stop getting backtraces. Idea 2: Only start capturing backtraces after the first 2 seconds of request time - small fast pages won't be penalised at all, and slower ones will be. Optionally combine with idea 1. Idea 3: capture 5% of queries randomly, allowing a broad overview of behaviour across the whole request, without imposing 100ms on a 100 query page. At this point I'm planning on just naively capturing everything: good pages with 30 or so queries will hardly notice the overhead (in todays world), and poor pages with > 30 queries will notice more, but already need fixing - and the act of fixing their query count will intrinsically fix their measurement overheads. -Rob
#!/usr/bin/env python import timeit # This generates a 20 deep stackframe, and then calls a function that makes a # stack trace, 1000 times - to simulate a deep appserver stack + oops capturing # and formating tracebacks. code = """import traceback oops = [] def stackframe(depth=20): l = locals() for i in range(10): l[str(i)] = i if depth: stackframe(depth-1) else: for i in range(1000): oops.append(''.join(traceback.format_stack())) stackframe() """ code2 = """import traceback oops = [] def stackframe(depth=20): l = locals() for i in range(10): l[str(i)] = i if depth: stackframe(depth-1) else: for i in range(1000): oops.append('') stackframe() """ # Time with tracebacks t = timeit.Timer(code) print t.repeat(number=10) # Time without t = timeit.Timer(code2) print t.repeat(number=10)
_______________________________________________ Mailing list: https://launchpad.net/~launchpad-dev Post to : launchpad-dev@lists.launchpad.net Unsubscribe : https://launchpad.net/~launchpad-dev More help : https://help.launchpad.net/ListHelp