Turns out that all the memory was taken by the url_times cache. In 300k 
requests, we have nearly 1k different URLS! All of this to generate the top 
200. So I now limit the size of that cache to 50 times the top_n we are looking 
for, and culling it by 10% everything we reach the limit.

I've tried various size and cull ratio, and this seems a good 
size/time/accuracy compromise. Generating the report now takes a constant 137M 
and the top200 report is identical to one generated with all of them.

It actually reduced the time to generate the report also by 9 seconds.

I also now persist the RequestTimes data structure to disk. This will allow us 
to generate aggregate report very fast now. And that structure takes ~1.8M of 
disk space once compressed.
-- 
https://code.launchpad.net/~flacoste/launchpad/ppr-constant-memory/+merge/39666
Your team Launchpad code reviewers is requested to review the proposed merge of 
lp:~flacoste/launchpad/ppr-constant-memory into lp:launchpad/devel.

_______________________________________________
Mailing list: https://launchpad.net/~launchpad-reviewers
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~launchpad-reviewers
More help   : https://help.launchpad.net/ListHelp

Reply via email to