Hello Peter,
thank you for the fast reply, I will have a look at the api, in the
meantime if you configure a big TTL in varnish (only the main page
<http://snapshot.debian.org/package/linux/> is going to change I guess) I
think I will stick with this implementation of the krawler because it is
already part of a building process (i'm just adding a new url to parse)
Looking forward to your reply.
Regards
L.

On Wed, Jul 13, 2016 at 10:54 AM, Peter Palfrader <[email protected]> wrote:

> On Wed, 13 Jul 2016, Luigi Tagliamonte wrote:
>
> > I'm Luigi a sysadmin that work for sysdig <http://www.sysdig.org/>. I
> saw
> > that you are the developer and maintainer of snapshot.debian.org, i'm
> > writing a krawler to get all the old debian linux-image and linux-kernel
> > deb packages to be able to pre-compile a kernel probe for the sysdig
> > project.
> >
> > I noticed that the krawler is really slow and I did some profiling with
> > cprofile (i'm using python).
> >
> > The most amount of time is spent in the open function to grub the HTML
> from
> > the website.
>
> The HTML stuff is autogenerated, and generating it is expensive.  There
> is a machine usable API that gives you more info and is cheaper to
> generate.
>
>
> > I was wondering if there are actions on you side that you can take to
> > improve the performances of the website like add a CDN or a varnish
> cache o
> > spot some bottleneck that you may have on your side?
>
> It's already supposed to be behind varnish, but it seems that one of the
> frontends
> was not correctly configured.  Fix that, thanks.
>
>
> > Here an example of the time spent from an AWS instance on us-east-1
> region
> > to grub a page from snapshot.debian.org (as you can see it took 20s):
> > [root@ip-10-10-1-128 ~]# curl -o /dev/null
> > http://snapshot.debian.org/package/linux/4.6~rc3-1~exp1/
>
> Doing these requests automatically seems unwise.  Please use the API -
> documentation link on the website.
>
>
> You might also try to see if something that has already been written,
> like debsnap, does not already serve your need.
>
> Cheers,
> --
>                             |  .''`.       ** Debian **
>       Peter Palfrader       | : :' :      The  universal
>  https://www.palfrader.org/ | `. `'      Operating System
>                             |   `-    https://www.debian.org/
>



-- 
Luigi
---
“The only way to get smarter is by playing a smarter opponent.”

Reply via email to