Hello Peter, thank you for the fast reply, I will have a look at the api, in the meantime if you configure a big TTL in varnish (only the main page <http://snapshot.debian.org/package/linux/> is going to change I guess) I think I will stick with this implementation of the krawler because it is already part of a building process (i'm just adding a new url to parse) Looking forward to your reply. Regards L.
On Wed, Jul 13, 2016 at 10:54 AM, Peter Palfrader <[email protected]> wrote: > On Wed, 13 Jul 2016, Luigi Tagliamonte wrote: > > > I'm Luigi a sysadmin that work for sysdig <http://www.sysdig.org/>. I > saw > > that you are the developer and maintainer of snapshot.debian.org, i'm > > writing a krawler to get all the old debian linux-image and linux-kernel > > deb packages to be able to pre-compile a kernel probe for the sysdig > > project. > > > > I noticed that the krawler is really slow and I did some profiling with > > cprofile (i'm using python). > > > > The most amount of time is spent in the open function to grub the HTML > from > > the website. > > The HTML stuff is autogenerated, and generating it is expensive. There > is a machine usable API that gives you more info and is cheaper to > generate. > > > > I was wondering if there are actions on you side that you can take to > > improve the performances of the website like add a CDN or a varnish > cache o > > spot some bottleneck that you may have on your side? > > It's already supposed to be behind varnish, but it seems that one of the > frontends > was not correctly configured. Fix that, thanks. > > > > Here an example of the time spent from an AWS instance on us-east-1 > region > > to grub a page from snapshot.debian.org (as you can see it took 20s): > > [root@ip-10-10-1-128 ~]# curl -o /dev/null > > http://snapshot.debian.org/package/linux/4.6~rc3-1~exp1/ > > Doing these requests automatically seems unwise. Please use the API - > documentation link on the website. > > > You might also try to see if something that has already been written, > like debsnap, does not already serve your need. > > Cheers, > -- > | .''`. ** Debian ** > Peter Palfrader | : :' : The universal > https://www.palfrader.org/ | `. `' Operating System > | `- https://www.debian.org/ > -- Luigi --- “The only way to get smarter is by playing a smarter opponent.”
