Re: [Launchpad-dev] Error during Launchpad environment setup (was Fwd: Traceback)

2009-11-19 Thread Jeroen Vermeulen

Michael Hudson wrote:


Would it make sense to have a zc.buildout-managed postgres instance?
It's not hard to set up a local cluster, under the current system user's
id, with a Unix-domain socket--something I've been wanting for a long
time but never got around to working on.  The all your db are belong to
us assumptions we have now are really ugly.


Yes please.  This is related to bug #107371 (Make the test suite able
to be run in parallel on a single machine,
https://bugs.edge.launchpad.net/launchpad-foundations/+bug/107371).


Stub made a very sensible observation about this that I'd missed 
completely: there's nothing stopping us from setting up separate, 
user-owned database clusters _using the system-wide installed postgres_. 
 We can run it using our own private configuration files etc.


So no buildout needed, really.  We'd still get pretty much all the benefits.


Jeroen

___
Mailing list: https://launchpad.net/~launchpad-dev
Post to : launchpad-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~launchpad-dev
More help   : https://help.launchpad.net/ListHelp


Re: [Launchpad-dev] Redesigning lazr.restful support for AJAX

2009-11-19 Thread Francis J. Lacoste
On November 18, 2009, Gavin Panella wrote:
 On Wed, 18 Nov 2009 14:06:43 -0500
 
 Francis J. Lacoste francis.laco...@canonical.com wrote:
  On November 18, 2009, Maris Fogels wrote:
   On 09-11-16 06:06 PM, Francis J. Lacoste wrote:
1) We want to be able to retrieve one or multiple page fragments
after making a model change through the web service (either PATCH or
named operation).
   
   (For example, subscribing a user means that maybe two or three
parts of the page will need to be updated.)
  
   Does anyone have any other examples of when this would be useful?
 
  Adding a team member inline requires updating three or four parts of the
  page. (Salgado and Bjorn have the details).
 
  It's a pretty common pattern.
 
2) We want to reuse the presentation logic already coded in the
regular web UI views.
   
   (We already have views that know how to render the different parts
that needs to be updated on the page.)
   
3) We want to provide a very productive API for front-end developers
to use. (Minimize amount of web-service specific glue needed,
minimize the amount of confusing asynchronous requests that needs to
be done).
   
So one solution we found out was to basically drop the way HTML
representation are currently implemented for something much simpler
and much more powerful.
   
Instead of wiring server-side one-to-one individual web-service
resources to an HTML representation, we should basically let the AJAX
client decide what should be returned from the web service call.
   
The client would request that one or more views be rendered after the
successful completion of the API call and their results returned to
the client.
   
We could use the accept-extension to the Accept header to specify the
list of views to return results.
   
Something like:
   
Accept:
application/x-page-fragments;views=subscribers=+subscriber-portlets,
count=+subscribers-count
   
That should return a JSON dictionary containing two keys: subscribers
containing the result of the +subscriber-porlets view and count
containing the result of the +subscribers-count view.
   
That removes the wiring that has to be done now server side to map
these existing views to the HTML representation of fields, removes
the problematic limitation of having only HTML representation and
allows us to retrieve efficiently all the fragments we need on the
page.
  
   I assume you are calling the webservice as you would normally, but you
   are adding your special Accept header?
 
  Yes, the idea is that the client controls the result part.
 
   If that is the case, then it still feels a bit strange.  You are taking
   the API model and namespace, and mapping that onto a set of page
   fragment names. However, this set of fragments actually represents the
   Launchpad website URL model, not the API model.  You are telling the
   webservice Get me a bug comment, but using this magic word, return it
   to me as   it would appear on the page at /+bugs/1234/.
 
  A fragment is a view.
 
  The common use case is not 'Get me a bug comment', but 'Create a bug
  comment and give me the result formatted through '@@bug-comment-details'.
 
  Or 'Subscribe this user' and gave me back '@@+portlet-subscribers,
  @@subscribers-count'.
 
 If we expect each widget to know what page fragments to update it'll
 become a tight bundle of spaghetti pretty quickly, especially if a
 widget it used on more than one page.
 
 Could the following pattern (probably has a name, but I don't know it)
 help avoid it, such that a widget A doesn't have to care about the
 presence or absence of widget B?
 
 widget A is going to add a subscriber,
 widget A says to client api machinery:
 add subscriber, and get me +subscriber-fragment,
 widget B see this, is interested in subscriber changes,
 widget B says to client api machinery:
 count me in, get +subscriber-xyz for me too,
 client api machinery sends request to launchpad:
 add subscriber, get +subscriber-fragment, +subscriber-xyz
 client api machinery dispatches results to widget A and widget B.
 

That would be a very nice addition. But I wouldn't implement that in the first 
iteration. Just to see if the nightmare happens first.

 ISTR that Christian Heilmann spoke of something like this at the Epic?
 
   If you want a real loop, look at it this way:  you map the database
   model to the webservice WADL addresses, and you map the database model
   to the website URL addresses.  URLs are rendered by views.  OK.  We are
   suggesting that views are also rendered by Fragment Names.  Also OK. 
   But then we are say that to call the Fragment Name, you have to know
   the correct address, and that address is an object picked out of the
   WADL? Why not use the address you already have, that being the URL of
   the page you are currently visiting?
 
  I think that we should allow 

[Launchpad-dev] Fwd: Results of Loggerhead Bug Investigation

2009-11-19 Thread Francis J. Lacoste

--  Forwarded Message  --

Subject: Results of Loggerhead Bug Investigation
Date: November 19, 2009
From: Max Kanat-Alexander mka...@everythingsolved.com
To: Francis J. Lacoste francis.laco...@canonical.com

Hey Francis.

So, I investigated the memory leak and the codebrowse hanging problem.

The memory leak is just some part of the code leaking a tiny amount of
memory when a specific type of page is requested (I'm not sure which
page yet). The tiny leak grows over days until the process is very
large.  I can reproduce the leak locally. The rest of the work involved
in this would be tracking down where the leak occurs and patching it--I
suspect this will not be a major architectural change, just a fix to
loggerhead or perhaps Paste. However, I think the task of initial
analysis is complete.

The more significant issue is the hangs. The hang is, in a sense, two
separate issues:

1) When a user loads multiple revisions of a very large branch
(launchpad itself, bzr itself, or mysql) that doesn't have a revision
graph yet, building the revision graph takes an enormous amount of time
and causes the rest of loggerhead to slow to a crawl, thus causing it to
appear hung for three to five minutes.

2) Loggerhead (or perhaps just a single loggerhead instance) doesn't
scale very well across many branches with many users, partially because
of how the revision graph is currently built and partially (I suspect)
because any given Python process is going to be limited by the Global
Interpreter Lock on how many concurrent requests it can honestly handle.

So the question for this issue is--what level would you like me to
address it on? If you'd like me to simply work on the revision graph
issue, I could do that within the current architecture of loggerhead and
devise a fix. Probably the simplest would be to just place a mutex
around building a revision graph for any one branch. However, that may
not fix the actual *performance* problems seen with codebrowse, it just
might make hangs less likely. A more general approach to loggerhead's
scalability would result in a fix for this and also for any performance
issues that loggerhead sees in the Launchpad environment. A quick search
for python paste scale in Google turns up
http://pypi.python.org/pypi/Spawning/ which (after sufficient vetting)
might be a reasonable solution. Then once we have a better single-server
solution, making it scale out to multiple servers (by having a central
store for the revision graph cache and making sure that loggerhead plays
well under load-balancing) would be the next step.

Perhaps the best thing would be to come up with a quick patch to save
the LOSAs from having to constantly restart codebrowse, and then once we
have that situation at least mitigated, we could go on to work on the
actual underlying scalability issue.

Does that sound good to you?

-Max
-- 
Max Kanat-Alexander
Chief Engineer
http://www.everythingsolved.com/
Everything Solved: Complete Computer Management

---
-- 
Francis J. Lacoste
francis.laco...@canonical.com


signature.asc
Description: This is a digitally signed message part.
___
Mailing list: https://launchpad.net/~launchpad-dev
Post to : launchpad-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~launchpad-dev
More help   : https://help.launchpad.net/ListHelp


Re: [Launchpad-dev] Immediate plan for Build Farm generic jobs

2009-11-19 Thread Julian Edwards
William Grant wrote:
 The DSP exists as soon as the SPN does. I can traverse and add branches
 (eg. for PPAs), but not file bugs unless there are SPRs published in a
 distro archive.

I'm still not convinced that traversal would work without SPRs,
publications etc., but it would be a pleasant surprise if it did.


___
Mailing list: https://launchpad.net/~launchpad-dev
Post to : launchpad-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~launchpad-dev
More help   : https://help.launchpad.net/ListHelp


[Launchpad-dev] A picture showing why edge can be slow

2009-11-19 Thread Maris Fogels

Hi everyone,

I tried to visit a bugs page today and found the page taking forever to load.
Thankfully I had Firebug running, and the Network tracing was active.  I took a
screenshot of the network activity.  It says a lot.

   http://people.canonical.com/~mars/bugpage-loading-problem.png

Note that this picture was taken on edge.  On edge we suffer through this with
every new revision, as that changes the file URLs and forces us to fetch
everything anew.

Regular LP users only suffer this once every four weeks, and on their first
visit to our site.


For what it's worth, we are working to fix these problems.


Maris





signature.asc
Description: OpenPGP digital signature
___
Mailing list: https://launchpad.net/~launchpad-dev
Post to : launchpad-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~launchpad-dev
More help   : https://help.launchpad.net/ListHelp


Re: [Launchpad-dev] Immediate plan for Build Farm generic jobs

2009-11-19 Thread Julian Edwards
Michael Hudson wrote:
 Er.  Hm.  I guess I don't know enough to be certain about this, but I
 think when we build a source package we're always going to want to then
 build it for an archive?  I would guess that you can handle
 republication by using the 'copy package' feature that exists already?

Possibly.  I am just thinking of the existing workflow where it's
possible to re-upload the same package to lots of places.  I'd like to
keep that behaviour if we can, in case copy package fails (it's been
known :( )


 I'm not sure what the difference would be, indeed.  What is the
 conceptual difference between a Build and a BuildPackageJob?  I was

The latter is more akin to a BuildQueue, i.e. it encapsulates a
*request* to build.

Build stores information about the build itself, like the log, the
sourcepackagerelease, the distroarchseries, etc.

 under the impression that there was still a split mostly to avoid
 rewriting all of Soyuz -- if not, I'd love to know (and try to remember)
 what it is.

Correct reasoning, wrong tables :)  I didn't want to remove BuildQueue.

 Cool. 'exactly the same thing' even goes as far as handing them to the
 process-upload.py script?

Yep, it just dumps the files in the incoming directory and tells the
upload processor to deal with it.

The only minor difference is that we can't do that asynchronously
without some code changes to process-upload, which depends on the
changes file being signed to report problems.  We need a way of
identifying the person who uploaded it (ie. requested the recipe build).

 One thing I need to change though is to stop this use of Popen since it
 blocks everything else on the buildd-manager.  There's a spec for this
 at
 https://blueprints.edge.launchpad.net/soyuz/+spec/buildd-manager-upload-decoupling
 
 If I read the above right, this isn't actually strictly speaking
 required to have build from recipe working?
 
 I can certainly see how it would be a good idea though.

It's not initially required. no.  However, it's a major scaling blocker
as it prevents anything else happening while we wait for the upload to
get processed - if it's a big package this can be 30-60 seconds.  A
quick intermediate fix would be to replace the use of Popen with a hook
into the python module itself.  This prevents initZopeless and friends
from running on script invocation which takes 5 seconds or so and would
be a significant improvement for small packages which only take a few
seconds to process.

 I think we want to key off recipe, not source package here.  But it
 sounds easy enough (select job.date_finished - job.date_started from
 buildsourcepackagejob, job where buildsourcepackagejob.job = job.id and
 job.status = completed and buildsourcepackagejob.recipe = $RECIPE order
 by job.date_started desc limit 1 or similar).

Yep, now I understand that there's more than one recipe for each package
that makes perfect sense.

 I think that probably makes sense.  Don't know where to do the signing
 though -- maybe the buildd-master could do it?

It can't, it would need to happen on the builders as it's a dsc file
signature, which in turn affects the changes file.

It if turns out to be too hard, we can live without it though.

J

___
Mailing list: https://launchpad.net/~launchpad-dev
Post to : launchpad-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~launchpad-dev
More help   : https://help.launchpad.net/ListHelp


Re: [Launchpad-dev] Immediate plan for Build Farm generic jobs

2009-11-19 Thread Julian Edwards
Robert Collins wrote:
 On Wed, 2009-11-18 at 16:24 +, Julian Edwards wrote:
  * The recipe is tied to a series.  Recipes should be independent of a
 series. 
 
 Why do you say this? Recipes today are extremely useful in conjunction
 with series (and certainly the version numbers they encode are _highly_
 series specific).

I was wrong, see my other email.

___
Mailing list: https://launchpad.net/~launchpad-dev
Post to : launchpad-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~launchpad-dev
More help   : https://help.launchpad.net/ListHelp


Re: [Launchpad-dev] Immediate plan for Build Farm generic jobs

2009-11-19 Thread Julian Edwards
Robert Collins wrote:
 On Wed, 2009-11-18 at 16:55 +, Julian Edwards wrote:
 However, my second point still stands.  We need to traverse to a
 recipe
 before the source package exists.

 Perhaps:
 /distro/series/+recipe/packagename/name

 Another thing to consider is the owner.  Do we want to make that part
 of
 the key data you need to resolve a single recipe? 
 
 I don't really get your second point.
 
 Are you saying 'people should be able to create new source package names
 by creating a recipe' ?

Yes.

Ultimately, we want to get rid of the old fashioned package uploads and
do this exclusively through branches/recipes.

___
Mailing list: https://launchpad.net/~launchpad-dev
Post to : launchpad-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~launchpad-dev
More help   : https://help.launchpad.net/ListHelp


[Launchpad-dev] Syncing more than just tip from upstream

2009-11-19 Thread Danilo Šegan
Hi all,

I am wondering if we have ever considered syncing more than just tip
from different upstreams?

I know it may make sense with an upstream like GNOME, because:

 * GNOME releases updates on the stable release well after the initial
release, and trunk and move away quite swiftly
 * Very often, maintainers in GNOME branch for a stable release well
before the actual release (sometimes especially because of a string
freeze), doing big changes in the trunk

Why is this important?

 * At least with translations, trunk can start to significantly diverge
in terms of what strings are there (i.e. POT files will change)
 * Sometimes, trunk translations may not be updated throughout the
string freeze period when the stable branch translation is updated, and
we want the latest ones from appropriate branches to get into Ubuntu

I am sure there are other reasons why would this be important and/or
useful, but there are some concerns as well:

 * Does it work today?  I'm going to experiment with a smallish GNOME
project.
 * If it works, does it work optimally?  I.e. will it stack branches so
disk space usage doesn't explode.
 * How are we going to get this happen?

My plan is to try it out and see what works and what doesn't work
already.

Cheers,
Danilo


___
Mailing list: https://launchpad.net/~launchpad-dev
Post to : launchpad-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~launchpad-dev
More help   : https://help.launchpad.net/ListHelp


Re: [Launchpad-dev] Fwd: Results of Loggerhead Bug Investigation

2009-11-19 Thread Stuart Bishop

On Thu, Nov 19, 2009 at 9:54 PM, Francis J. Lacoste

       So the question for this issue is--what level would you like me to
address it on? If you'd like me to simply work on the revision graph
issue, I could do that within the current architecture of loggerhead and
devise a fix. Probably the simplest would be to just place a mutex
around building a revision graph for any one branch. However, that may
not fix the actual *performance* problems seen with codebrowse, it just
might make hangs less likely. A more general approach to loggerhead's
scalability would result in a fix for this and also for any performance
issues that loggerhead sees in the Launchpad environment. A quick search


We could use a mutex or limit the number of simultaneous requests the load 
balancer sends the backend (or both). Even if building the revision graph is 
amazingly fast, we still need this as the failure can still occur under 
sufficient load.

--
Stuart Bishop stu...@stuartbishop.net
http://www.stuartbishop.net/



signature.asc
Description: OpenPGP digital signature
___
Mailing list: https://launchpad.net/~launchpad-dev
Post to : launchpad-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~launchpad-dev
More help   : https://help.launchpad.net/ListHelp


[Launchpad-dev] Ratings and reviews in Software Center

2009-11-19 Thread Barry Warsaw
At the end of Tuesday's session on ratings and reviews in Software Center for
Lucid, I captured the gobby doc and put it in our dev wiki:

https://dev.launchpad.net/SoftwareCenterRatingsAndReviewsGobbySession

When we start thinking about the cross-team implications of the work we're
requested to do for supporting Software Center, it might make sense to move
this out of the Registry team page.

https://dev.launchpad.net/Registry

For now though, if you're attending other UDS sessions related to Software
Center and its requirements for Launchpad, please make sure the gobby docs get
uploaded to the dev wiki.  We can reorganize them later.

Thanks,
-Barry


signature.asc
Description: PGP signature
___
Mailing list: https://launchpad.net/~launchpad-dev
Post to : launchpad-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~launchpad-dev
More help   : https://help.launchpad.net/ListHelp


Re: [Launchpad-dev] Immediate plan for Build Farm generic jobs

2009-11-19 Thread William Grant
On Thu, 2009-11-19 at 15:07 +, Julian Edwards wrote:
 William Grant wrote:
  The DSP exists as soon as the SPN does. I can traverse and add branches
  (eg. for PPAs), but not file bugs unless there are SPRs published in a
  distro archive.
 
 I'm still not convinced that traversal would work without SPRs,
 publications etc., but it would be a pleasant surprise if it did.

It does. The code ({DistroSeries,Distribution}.getSourcePackage) makes
it fairly clear, and
https://launchpad.net/ubuntu/+source/rdiff-backup1.1.9 works. That
package has never existed outside PPAs.


___
Mailing list: https://launchpad.net/~launchpad-dev
Post to : launchpad-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~launchpad-dev
More help   : https://help.launchpad.net/ListHelp


[Launchpad-dev] YUI 3.0.0 has landed on db-devel

2009-11-19 Thread Maris Fogels

Hi everyone,

Just a heads-up, YUI 3.0.0 has landed on db-devel.

You can now merge db-devel into your in-progress work (Tim), to see if 
everything still functions as expected, or if you want to start work against the 
awesome YUI 3 final release.


There are still a few problems with our JavaScript resulting from the 3.0.0 
upgrade.  I will merge db-devel back into our devel mainline once I get them 
fixed.  The forecast for that is early next week.


Best,
Maris



signature.asc
Description: OpenPGP digital signature
___
Mailing list: https://launchpad.net/~launchpad-dev
Post to : launchpad-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~launchpad-dev
More help   : https://help.launchpad.net/ListHelp


Re: [Launchpad-dev] Ratings and reviews in Software Center

2009-11-19 Thread Jonathan Lange
On Thu, Nov 19, 2009 at 2:06 PM, Barry Warsaw ba...@canonical.com wrote:
 At the end of Tuesday's session on ratings and reviews in Software Center for
 Lucid, I captured the gobby doc and put it in our dev wiki:

 https://dev.launchpad.net/SoftwareCenterRatingsAndReviewsGobbySession

 When we start thinking about the cross-team implications of the work we're
 requested to do for supporting Software Center, it might make sense to move
 this out of the Registry team page.

 https://dev.launchpad.net/Registry

 For now though, if you're attending other UDS sessions related to Software
 Center and its requirements for Launchpad, please make sure the gobby docs get
 uploaded to the dev wiki.  We can reorganize them later.


There have already been a few. mpt, would you perchance be able to
give us a list of sessions about the Software Center?

jml

___
Mailing list: https://launchpad.net/~launchpad-dev
Post to : launchpad-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~launchpad-dev
More help   : https://help.launchpad.net/ListHelp


Re: [Launchpad-dev] Immediate plan for Build Farm generic jobs

2009-11-19 Thread James Westby
Hi,

Thanks for working on this everyone.

Apologies if I duplicate comments already made elsewhere in the thread.

On Wed Nov 18 00:15:27 -0600 2009 Michael Hudson wrote:
 Separately, we need to decide where a recipe lives.  The current
 thinking is
 https://launchpad.net/ubuntu/karmic/+source/some-package/+recipe/recipe-name;
 which seems OK to me (we'd have to trust a bit that this recipe would
 build a recipe for some-package in karmic, but that doesn't seem any
 different to say branches today).

Why not under a person? How do you do access control?

 Finally, we could stick an archive on the recipe, but maybe we don't
 want to.  I'll talk about this a bit more later in the mail.

I don't think we want to. They are not archive-specific, and you could
put the recipe in to any archive.

However, knowing which recipe was used to build a package would be
useful.

 One of the things bzr-builder does when it creates the debianised source
 tree is create a manifest, which is a sort of frozen version of a recipe
 -- it references particular revisions of the branches so as to allow a
 repeat of exactly this build.  We could use a manifest like this to
 actually run the recipe: at the point where the build is requested, we
 make the manifest and stuff it into the database.  This seems like a
 neat idea, but isn't how bzr-builder works now as far as I can tell.

It's not. However, I think it's quite a good idea, otherwise you
have to collect the manifest too, or have the collector extract it
from the source package.

The difficulty would be resolving revision ids, as that requires bzr
code, rather than just SQL (I assume).

 I think the current plan is to use bzr-builder to make the debianized
 source tree and bzr-builddeb to then make the source package.

That currently isn't possible, but we plan to fix that. This may be
the preferred way, but we need to decide how they will work together
to be sure.

 I'm
 presume the process for getting the source package off the builder and
 into the process of being built will follow that of the existing
 builders: the builder will tell the buildd-manager where to get the
 .dsc, the manager will parse this to find the other parts of the package
 and then grab them, shove all of the files into the librarian and
 trigger the existing parts of soyuz to look at them somehow[1].

Will tell it where the .changes is, no? Also, as I said, you may
have to collect a manifest as well.

 In case the above wasn't enough, here's some things I haven't thought
 hard about:
 
  - do people want to subscribe to a recipe?
- does this mean getting notified when the recipe builds or fails to
  build?
- does this mean getting notified when the recipe is changed?

This is very important to get right. I don't know whether subscription
is the right approach, but notifications on problems need to be communicated
to the right people in the right way.

Thanks,

James

___
Mailing list: https://launchpad.net/~launchpad-dev
Post to : launchpad-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~launchpad-dev
More help   : https://help.launchpad.net/ListHelp


[Launchpad-dev] When updating meta-lp-deps, please remember....

2009-11-19 Thread Max Bowsher
When updating meta-lp-deps, please remember

1) To copy the package to all supported series.

2) To tag the upload in bzr. You can use bzr mark-uploaded to
automatically set a tag named from the top entry in debian/changelog.


I've copied the most recent upload to jaunty and added the missing tags
for all the recent uploads.

Max.



signature.asc
Description: OpenPGP digital signature
___
Mailing list: https://launchpad.net/~launchpad-dev
Post to : launchpad-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~launchpad-dev
More help   : https://help.launchpad.net/ListHelp


Re: [Launchpad-dev] A picture showing why edge can be slow

2009-11-19 Thread Jeroen Vermeulen

Maris Fogels wrote:

Note that this picture was taken on edge.  On edge we suffer through 
this with

every new revision, as that changes the file URLs and forces us to fetch
everything anew.


I don't suppose there's some way of cheating our way around that for 
unchanged files?



Jeroen

___
Mailing list: https://launchpad.net/~launchpad-dev
Post to : launchpad-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~launchpad-dev
More help   : https://help.launchpad.net/ListHelp


Re: [Launchpad-dev] Fwd: Results of Loggerhead Bug Investigation

2009-11-19 Thread Michael Hudson
Francis J. Lacoste wrote:
 --  Forwarded Message  --
 
 Subject: Results of Loggerhead Bug Investigation
 Date: November 19, 2009
 From: Max Kanat-Alexander mka...@everythingsolved.com
 To: Francis J. Lacoste francis.laco...@canonical.com
 
   Hey Francis.
 
   So, I investigated the memory leak and the codebrowse hanging problem.
 
   The memory leak is just some part of the code leaking a tiny amount of
 memory when a specific type of page is requested (I'm not sure which
 page yet). The tiny leak grows over days until the process is very
 large.  I can reproduce the leak locally. The rest of the work involved
 in this would be tracking down where the leak occurs and patching it--I
 suspect this will not be a major architectural change, just a fix to
 loggerhead or perhaps Paste. However, I think the task of initial
 analysis is complete.

This sounds sane.

   The more significant issue is the hangs. The hang is, in a sense, two
 separate issues:
 
   1) When a user loads multiple revisions of a very large branch
 (launchpad itself, bzr itself, or mysql) that doesn't have a revision
 graph yet, building the revision graph takes an enormous amount of time
 and causes the rest of loggerhead to slow to a crawl, thus causing it to
 appear hung for three to five minutes.

As suspected then, but it sounds worse than I'd guessed.

   2) Loggerhead (or perhaps just a single loggerhead instance) doesn't
 scale very well across many branches with many users, partially because
 of how the revision graph is currently built and partially (I suspect)
 because any given Python process is going to be limited by the Global
 Interpreter Lock on how many concurrent requests it can honestly handle.

Yeah.

   So the question for this issue is--what level would you like me to
 address it on? If you'd like me to simply work on the revision graph
 issue, I could do that within the current architecture of loggerhead and
 devise a fix. Probably the simplest would be to just place a mutex
 around building a revision graph for any one branch.

That's probably a good fix for loggerhead, but maybe not sufficient for
Launchpad.

 However, that may
 not fix the actual *performance* problems seen with codebrowse, it just
 might make hangs less likely. A more general approach to loggerhead's
 scalability would result in a fix for this and also for any performance
 issues that loggerhead sees in the Launchpad environment. A quick search
 for python paste scale in Google turns up
 http://pypi.python.org/pypi/Spawning/ which (after sufficient vetting)
 might be a reasonable solution.

Another team at Canonical tried spawning and had to give up and go back
to paste.  So let's learn from their misfortune :)

 Then once we have a better single-server
 solution, making it scale out to multiple servers (by having a central
 store for the revision graph cache and making sure that loggerhead plays
 well under load-balancing) would be the next step.

As Rob pointed out in the bug report, if we can have the load balancer
always direct requests for the same branch to the same loggerhead
backend, we don't need to worry too much about the central store part.

Speaking more generally, the problem is the revision cache -- can we
make it go away, or at least handle it better?  I always forget why we
actually need it, so let's try to recap:

 1. Going from revid - revno.  Loggerhead does this a lot.
 2. Going from revno - revid.  Probably done ~once per page.
 3. In History.get_revids_from().  This gets into behaviour territory.
Basically it mainline-izes a bunch of revisions.  It can probably
touch quite a lot of the graph.
 4. get_merge_point_list().  I can't remember what this does :(
 5. get_short_revision_history_by_fileid().  Just uses it to get the set
of all revids in the branch.

Y'see, one of the problems with a central graph store is that graphs are
big, and any central store implies IPC which implies serialization, and
serializing and deserializing something as big as Launchpad's revision
graph cache is annoyingly slow.  So one idea would be to have this
central store not serve up entire graphs, but instead be able to answer
the questions above.  There would be many problems with this approach of
course -- for example you probably don't want to make a cross procedure
call for every revid - revno translation loggerhead does, and gathering
all the revids you'd want to translate before you start rendering would
be painful.

On the more serious end, it might be worth pushing the generation of the
cache into the store though and then it can compute stores in
subprocesses or whatever to maximize CPU utilization, and to maintain
performance of the loggerhead process(es).

Another, probably more tractable problem would be to be able to
incrementally generate revision caches in the common case of revisions
merely being added to the branch.  If the graph store stored the graphs
as more than just a lump, you 

Re: [Launchpad-dev] Immediate plan for Build Farm generic jobs

2009-11-19 Thread Michael Hudson
James Westby wrote:
 Hi,
 
 Thanks for working on this everyone.
 
 Apologies if I duplicate comments already made elsewhere in the thread.
 
 On Wed Nov 18 00:15:27 -0600 2009 Michael Hudson wrote:
 Separately, we need to decide where a recipe lives.  The current
 thinking is
 https://launchpad.net/ubuntu/karmic/+source/some-package/+recipe/recipe-name;
 which seems OK to me (we'd have to trust a bit that this recipe would
 build a recipe for some-package in karmic, but that doesn't seem any
 different to say branches today).
 
 Why not under a person? How do you do access control?

Recipes would definitely have an owner for access control.  Whether that
would be part of the URL/namespace is a separate decision.

 Finally, we could stick an archive on the recipe, but maybe we don't
 want to.  I'll talk about this a bit more later in the mail.
 
 I don't think we want to. They are not archive-specific, and you could
 put the recipe in to any archive.
 
 However, knowing which recipe was used to build a package would be
 useful.

Right.

 One of the things bzr-builder does when it creates the debianised source
 tree is create a manifest, which is a sort of frozen version of a recipe
 -- it references particular revisions of the branches so as to allow a
 repeat of exactly this build.  We could use a manifest like this to
 actually run the recipe: at the point where the build is requested, we
 make the manifest and stuff it into the database.  This seems like a
 neat idea, but isn't how bzr-builder works now as far as I can tell.
 
 It's not. However, I think it's quite a good idea, otherwise you
 have to collect the manifest too, or have the collector extract it
 from the source package.
 
 The difficulty would be resolving revision ids, as that requires bzr
 code, rather than just SQL (I assume).

Hm.  At least making the manifest doesn't require running arbitrary code :-)

 I think the current plan is to use bzr-builder to make the debianized
 source tree and bzr-builddeb to then make the source package.
 
 That currently isn't possible, but we plan to fix that. This may be
 the preferred way, but we need to decide how they will work together
 to be sure.

OK, I'll apply a thin SEP[1] film to this issue then for now.  It's
certainly independent from how we model this in the database.

 I'm
 presume the process for getting the source package off the builder and
 into the process of being built will follow that of the existing
 builders: the builder will tell the buildd-manager where to get the
 .dsc, the manager will parse this to find the other parts of the package
 and then grab them, shove all of the files into the librarian and
 trigger the existing parts of soyuz to look at them somehow[1].
 
 Will tell it where the .changes is, no?

Er, maybe.  My brain refuses to remember all these debian details :)

 Also, as I said, you may
 have to collect a manifest as well.

Right.  I think the existing soyuz code is fairly flexible in this regard.

 In case the above wasn't enough, here's some things I haven't thought
 hard about:

  - do people want to subscribe to a recipe?
- does this mean getting notified when the recipe builds or fails to
  build?
- does this mean getting notified when the recipe is changed?
 
 This is very important to get right.

OK.

 I don't know whether subscription
 is the right approach, but notifications on problems need to be communicated
 to the right people in the right way.

Thanks for the clear specification :-)

Cheers,
mwh

[1] SEP == someone else's problem, in case anyone hasn't read the
hitchiker's guide to the galaxy books.

___
Mailing list: https://launchpad.net/~launchpad-dev
Post to : launchpad-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~launchpad-dev
More help   : https://help.launchpad.net/ListHelp


Re: [Launchpad-dev] Immediate plan for Build Farm generic jobs

2009-11-19 Thread Michael Hudson
Julian Edwards wrote:
 Michael Hudson wrote:
 Er.  Hm.  I guess I don't know enough to be certain about this, but I
 think when we build a source package we're always going to want to then
 build it for an archive?  I would guess that you can handle
 republication by using the 'copy package' feature that exists already?
 
 Possibly.  I am just thinking of the existing workflow where it's
 possible to re-upload the same package to lots of places.

I guess at the worst you'll be able to download the source package (and
resign it?) and then reupload it...

  I'd like to
 keep that behaviour if we can, in case copy package fails (it's been
 known :( )

This sounds like one of those things that isn't required for the first
cut, but that we should understand a bit more so that we don't set
ourselves up for trouble down the track.  Can you explain the use cases
you have in mind?  Is it worth designing the schema so that you can
associate a buildrecipejob with multiple archives, or would that be YAGNI?

 I'm not sure what the difference would be, indeed.  What is the
 conceptual difference between a Build and a BuildPackageJob?  I was
 
 The latter is more akin to a BuildQueue, i.e. it encapsulates a
 *request* to build.
 
 Build stores information about the build itself, like the log, the
 sourcepackagerelease, the distroarchseries, etc.

I thought that Job modelled the job itself including the result of it,
not just the request.  But clearly it doesn't have to...

 under the impression that there was still a split mostly to avoid
 rewriting all of Soyuz -- if not, I'd love to know (and try to remember)
 what it is.
 
 Correct reasoning, wrong tables :)  I didn't want to remove BuildQueue.

Ah.

 Cool. 'exactly the same thing' even goes as far as handing them to the
 process-upload.py script?
 
 Yep, it just dumps the files in the incoming directory and tells the
 upload processor to deal with it.
 
 The only minor difference is that we can't do that asynchronously
 without some code changes to process-upload, which depends on the
 changes file being signed to report problems.  We need a way of
 identifying the person who uploaded it (ie. requested the recipe build).

Ah right.

 One thing I need to change though is to stop this use of Popen since it
 blocks everything else on the buildd-manager.  There's a spec for this
 at
 https://blueprints.edge.launchpad.net/soyuz/+spec/buildd-manager-upload-decoupling
 If I read the above right, this isn't actually strictly speaking
 required to have build from recipe working?

 I can certainly see how it would be a good idea though.
 
 It's not initially required. no.  However, it's a major scaling blocker
 as it prevents anything else happening while we wait for the upload to
 get processed - if it's a big package this can be 30-60 seconds.  A
 quick intermediate fix would be to replace the use of Popen with a hook
 into the python module itself.  This prevents initZopeless and friends
 from running on script invocation which takes 5 seconds or so and would
 be a significant improvement for small packages which only take a few
 seconds to process.

Makes sense.  I just find it helpful to have absolutely required to
work at all and required to have even slightly acceptable performance
separated in my head.

 I think we want to key off recipe, not source package here.  But it
 sounds easy enough (select job.date_finished - job.date_started from
 buildsourcepackagejob, job where buildsourcepackagejob.job = job.id and
 job.status = completed and buildsourcepackagejob.recipe = $RECIPE order
 by job.date_started desc limit 1 or similar).
 
 Yep, now I understand that there's more than one recipe for each package
 that makes perfect sense.

Good!

 I think that probably makes sense.  Don't know where to do the signing
 though -- maybe the buildd-master could do it?
 
 It can't, it would need to happen on the builders as it's a dsc file
 signature, which in turn affects the changes file.

Ergh.  Maybe it's overly paranoid, but if it has to happen on the slave,
 there's not much meaning to signing it with any system-wide key -- I
guess you could sign it with the destination archive key though.

 It if turns out to be too hard, we can live without it though.

Good to know :)

Cheers,
mwh

___
Mailing list: https://launchpad.net/~launchpad-dev
Post to : launchpad-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~launchpad-dev
More help   : https://help.launchpad.net/ListHelp


Re: [Launchpad-dev] Immediate plan for Build Farm generic jobs

2009-11-19 Thread Michael Hudson
Jonathan Lange wrote:
 On Thu, Nov 19, 2009 at 9:36 AM, Julian Edwards
 julian.edwa...@canonical.com wrote:
 Michael Hudson wrote:
 However, my second point still stands.  We need to traverse to a recipe
 before the source package exists.
 I don't really get this.  It sounds like the model poking the UI in the
 eye.  (If we really have to, we can have the appropriate traverse()
 method look ahead in the request and do different things if the next but
 one segment is +recipe).
 Well let's put it another way - you have a new recipe for a new package
 that's not in Launchpad.

 How would you traverse to it?  It seems a bit chicken-and-egg to me.

 Another thing to consider is the owner.  Do we want to make that part of
 the key data you need to resolve a single recipe?
 I don't know (honestly).  It would make the URLs even longer, but would
 probably be more consistent with the rest of Launchpad.
 Yeah that was my worry.

 Jono, any opinion?

 
 I haven't kept up with this thread so far.
 
 My own opinion is that the exact details of traversal don't matter
 yet, just as long as we pick something that allows:
   * recipes to refer to one another.

I still don't understand at all how this specific part works.  If
bzr-builder supports cross recipe references today, it doesn't document
them.

   * finding all of the recipes that are associated with a given source package
   * finding all of the recipes that are associated with a given branch
   * linking to past builds of recipes
   * creating recipes for things (esp. SPNs) that don't exist yet.
 
 There are many traversal paths that satisfy all of these constraints.
 https://launchpad.net/+recipe/$RECIPE_ID being the simplest.

That seems pretty gross.

 The open questions we have, as I see it, are:
   * should recipes have names? (probably yes)

Yes, I think so.

   * what should be the namespaces?

DistroSeriesSourcePackage seems ok to me.  Maaybe user +
DistroSeriesSourcePackage?

 In that case, I wonder what's the simplest thing that could possibly
 work. Perhaps I should read the thread :)

Might be good :)

Cheers,
mwh

___
Mailing list: https://launchpad.net/~launchpad-dev
Post to : launchpad-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~launchpad-dev
More help   : https://help.launchpad.net/ListHelp


Re: [Launchpad-dev] Fwd: Results of Loggerhead Bug Investigation

2009-11-19 Thread Michael Hudson
Ian Clatworthy wrote:
 Michael Hudson wrote:
 
 Speaking more generally, the problem is the revision cache -- can we
 make it go away, or at least handle it better?  I always forget why we
 actually need it, so let's try to recap:

  1. Going from revid - revno.  Loggerhead does this a lot.
  2. Going from revno - revid.  Probably done ~once per page.
  3. In History.get_revids_from().  This gets into behaviour territory.
 Basically it mainline-izes a bunch of revisions.  It can probably
 touch quite a lot of the graph.
  4. get_merge_point_list().  I can't remember what this does :(
  5. get_short_revision_history_by_fileid().  Just uses it to get the set
 of all revids in the branch.
 
 Can my bzr-historycache plugin help? My understanding is that it's less
 useful than before 2a (and may even need some TLC right now) but it's
 designed to partially fix this sort of thing.

Ah yes, that keeps slipping my mind!  Max, please look at historycache
and see if it can help us :-)

We can probably arrange for the branch puller to make sure that every
branch has a populated history cache btw.

Cheers,
mwh


___
Mailing list: https://launchpad.net/~launchpad-dev
Post to : launchpad-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~launchpad-dev
More help   : https://help.launchpad.net/ListHelp


Re: [Launchpad-dev] Fwd: Results of Loggerhead Bug Investigation

2009-11-19 Thread Jeroen Vermeulen

Francis J. Lacoste wrote:


The memory leak is just some part of the code leaking a tiny amount of
memory when a specific type of page is requested (I'm not sure which
page yet). The tiny leak grows over days until the process is very
large.  I can reproduce the leak locally. The rest of the work involved
in this would be tracking down where the leak occurs and patching it--I
suspect this will not be a major architectural change, just a fix to
loggerhead or perhaps Paste. However, I think the task of initial
analysis is complete.


FWIW we do have one known leak that's being fixed right now:

https://bugs.launchpad.net/storm/+bug/475148


Jeroen

___
Mailing list: https://launchpad.net/~launchpad-dev
Post to : launchpad-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~launchpad-dev
More help   : https://help.launchpad.net/ListHelp