Re: [Launchpad-dev] Immediate plan for Build Farm generic jobs

Michael Hudson Wed, 18 Nov 2009 15:34:06 -0800

Julian Edwards wrote:
> Michael Hudson wrote:
>  > So, we finally get to think concretely about some things :-)
> 
> I hope it was worth the wait.


Well, it might have been better to have done this thinking last week in
Mooloolaba, but oh well...

>> Or, and there are actually reasons for doing this, we could do something
>> in between: store it mostly as text but replace the references to
>> branches in the text with references to database objects (probably the
>> id of entries in some RecipeBranch linking table).  This would let us
>> (a) check that the branches exist at parsing time (b) keep the
>> references up to date if the branch is moved or renamed (c) prevent
>> branches that are referenced in Recipes from being deleted.
> 
> This seems very reasonable to me.

Hooray.  Glad to see that Tim and Aaron and I are smoking the same crack
as the rest of you guys :-)

>> Separately, we need to decide where a recipe lives.  The current
>> thinking is
>> "https://launchpad.net/ubuntu/karmic/+source/some-package/+recipe/recipe-name";
>> which seems OK to me (we'd have to trust a bit that this recipe would
>> build a recipe for some-package in karmic, but that doesn't seem any
>> different to say branches today).
> 
> I don't think this is good for a few reasons:
>  * The URL is too long
>  * The traversal would fail for a recipe where a source package name
> doesn't exist yet
>  * The recipe is tied to a series.  Recipes should be independent of a
> series.
>  * Why have more than one recipe for one package?
> 
> My suggestion is:
> /<distro>/+recipe/packagename

Well... I'll reply to your followup about this I guess...

>> Finally, we could stick an archive on the recipe, but maybe we don't
>> want to.  I'll talk about this a bit more later in the mail.
> 
> We absolutely don't want to do this, because source packages can exist
> in many archives.  The act of publication of a package is separate from
> its existence.

Well, I knew the above was true, but this morning it seems much more
obvious to maintain this separation in the recipe too.

>> This leads to a schema a bit like:
>>
>> Recipe:
> 
> I think this should be called SourcePackageRecipe, mostly because we
> might have other recipes and somewhat for consistency reasons.

+1 to that.

>>  - id, registrant, date_created, owner, date_last_modified
>>    - all standard launchpad fields.  the owner would be able to edit
>>      the recipe.
>>  - name
>>    - the last bit of the url
> 
> I don't think you need this.
> 
>>  - distroseries, sourcepackagename
>>    - provides the rest of the url
> 
> I would say you need "distribution" instead of distroseries.

Well this is all tied in a pretty simple way to the "where does a recipe
live", so once we sort that out this is easy.

>>  - recipe
>>    - a text field containing the text of the recipe (probably with
>> mangled branch references so lp:foo would be replaced with lp:21435)
>>
>> RecipeBranch:
>>  - id, recipe, branch
>>    - all obvious i hope
>>
>> What follows hopefully doesn't depend too much on how the above gets
>> decided in the end.
>>
>> For the job of building a recipe into a source package we'll have a
>> BuildSourcePackageFromRecipeJob table.  I foresee this table looking like:
>>
>> BuildSourcePackageFromRecipeJob
>>  - job
>>  - recipe
>>  - archive?
> 
> I don't know if archive is going to be necessary here because it ties
> the creation of a source package to one archive.  We might want to take
> an existing recipe build job and re-upload it to a different archive.

Er.  Hm.  I guess I don't know enough to be certain about this, but I
think when we build a source package we're always going to want to then
build it for an archive?  I would guess that you can handle
republication by using the 'copy package' feature that exists already?

>> BuildQueue will get a row with a job column will reference same job and
>> have a particular job_type.
> 
> Yep.
> 
>> One of the things bzr-builder does when it creates the debianised source
>> tree is create a manifest, which is a sort of frozen version of a recipe
>> -- it references particular revisions of the branches so as to allow a
>> repeat of exactly this build.  We could use a manifest like this to
>> actually run the recipe: at the point where the build is requested, we
>> make the manifest and stuff it into the database.  This seems like a
>> neat idea, but isn't how bzr-builder works now as far as I can tell.
> 
> I think this manifest should be stored somewhere with the build job.  As
> I discussed with Jono on Monday, we know we're going to need a new table
> (SourcePackageRecipeBuild) that records a build event to get a source
> package from a recipe.  This table should have the manifest on it.
>
> It's also possible we don't need this table and we can just use the
> BuildSourcePackageFromRecipeJob.

I'm not sure what the difference would be, indeed.  What is the
conceptual difference between a Build and a BuildPackageJob?  I was
under the impression that there was still a split mostly to avoid
rewriting all of Soyuz -- if not, I'd love to know (and try to remember)
what it is.

>> This doesn't include anything that will actually create
>> BuildSourcePackageFromRecipeJob rows (say every day for a daily build
>> PPA).  I guess we can worry about this later.
>>
>> I think the current plan is to use bzr-builder to make the debianized
>> source tree and bzr-builddeb to then make the source package.  I'm
>> presume the process for getting the source package off the builder and
>> into the process of being built will follow that of the existing
>> builders: the builder will tell the buildd-manager where to get the
>> .dsc, the manager will parse this to find the other parts of the package
>> and then grab them, shove all of the files into the librarian and
>> trigger the existing parts of soyuz to look at them somehow[1].
> 
> What happens for binary builds is that the builders return a bunch of
> files that the buildd-manager throws into a directory on disk.  It then
> calls process-upload.py (using Popen :( ) to deal with it.  For a source
> package resulting from a recipe build we can do exactly the same thing.

Cool. 'exactly the same thing' even goes as far as handing them to the
process-upload.py script?

> One thing I need to change though is to stop this use of Popen since it
> blocks everything else on the buildd-manager.  There's a spec for this
> at
> https://blueprints.edge.launchpad.net/soyuz/+spec/buildd-manager-upload-decoupling

If I read the above right, this isn't actually strictly speaking
required to have build from recipe working?

I can certainly see how it would be a good idea though.

>> Something that's missing from all the above is how the archive is
>> selected.  It's more or less essential that the
>> BuildSourcePackageFromRecipeJob knows the archive, so that the generated
>> source package can be built for the right one.
> 
> Or we can divorce the archive from the Job entirely and have another
> mechanism that records who requested the build/upload.  This is
> important because we need to observe upload ACLs.

I still don't see why you'd want it separate from the job (job rows
persist beyond the execution of the .  It would be good to record
requester on the job though, for sure.

>   It could be tied to the
>> recipe or it could be supplied when the job row is created.  In some
>> sense the archive is totally orthogonal to the recipe, but OTOH, I can't
>> really see the use case for targeting more than one archive with a
>> recipe.  Advice welcome.
> 
> The recipe, no, the actual source package, yes.

I've got this message now :)

>> In case the above wasn't enough, here's some things I haven't thought
>> hard about:
>>
>>  - do people want to subscribe to a recipe?
>>    - does this mean getting notified when the recipe builds or fails to
>>      build?
>>    - does this mean getting notified when the recipe is changed?
> 
> If a recipe fails to build we need to notify the recipe owner and if the
> person requesting the build is different, them also.
> 
> Soyuz already has a lot of code for dealing with notifications, it
> should be easy enough to hook some more bits in.

OK.

>>  - the whole privacy thing.
>>    - do we only allow recipes to be created that reference branches the
>>      owner can see?
> 
> Makes sense.
> 
>>    - is having the people who can view the recipe being the intersection
>>      of those that can see the branches reasonable?
> 
> Yes.

Cool.

>>    - the issues of accessing private branches from the buildslaves
>>      scares me a bit, I hope we can avoid worrying about that until some
>>      time in 2010.
> 
> Yeah, I had only considered the firewall rules from the slaves.
> Presumably we'll need a buildd-slave SSH key that can access everything?

This sounds too binary for me: I don't think we want anyone who can
build a private recipe at all to be able to access all private branches.
 I think we can do something along the lines of embedding credentials in
the HTTP url.

>>> The model code should implement the interface ISoyuzJob (although this is a 
>>> terrible name, it will be changed) which is declared in 
>>> lib/lp/soyuz/interfaces/soyuzjob.py.
>> This file doesn't seem to exist?
> 
> See Muharem's response.
> 
>> In coarse outline, building a source package from a recipe isn't very
>> different from building a binary from a source package, so this sounds
>> like it will be a mass of (presumably devilish) details rather than deep
>> design work.
> 
> Hopefully yes.  One thing that we need to make sure of is that *all*
> build jobs must have a determinate build time.
> 
> We currently have a system in place that estimates how long it will be
> before your package build starts.  It does this by adding up the
> previous build times for all the packages in the queue in front of you.
> 
> To be able to continue to do this, we need to be able to look up a
> SourcePackageRecipeBuild for a package name (or the equivalent Job table
> row) and see how long it took to build last time.

I think we want to key off recipe, not source package here.  But it
sounds easy enough (select job.date_finished - job.date_started from
buildsourcepackagejob, job where buildsourcepackagejob.job = job.id and
job.status = completed and buildsourcepackagejob.recipe = $RECIPE order
by job.date_started desc limit 1 or similar).

> Muharem is refactoring our code right now and making an interface that
> your job classes must implement for us to be able to do this.
> 
>> [1] I guess the fact that these packages aren't signed will bite us in
>>    the ass somehow or other at some point, but I don't know how much it
>>    affects how this bit would work.  We don't *have* to get the source
>>    package files into Soyuz via the Librarian I guess.
> 
> Mark S has suggested that we have a single Launchpad key to sign them,
> so that if the packages are used outside of Launchpad then people know
> where they came from.

I think that probably makes sense.  Don't know where to do the signing
though -- maybe the buildd-master could do it?

Cheers,
mwh

_______________________________________________
Mailing list: https://launchpad.net/~launchpad-dev
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~launchpad-dev
More help   : https://help.launchpad.net/ListHelp

Re: [Launchpad-dev] Immediate plan for Build Farm generic jobs

Reply via email to