Re: [Bitbake-dev] Implementing support for the remote fetching of source control revision information

Richard Purdie Sun, 01 Jul 2007 17:11:18 -0700

On Mon, 2007-07-02 at 01:10 +0200, Henryk Plötz wrote:
> > What does refname default to? It always has to be set?
> 
> No default, yet. I figured since this would only be needed by very few
> packages it would be ok to force all of them to set the parameter. But
> see below.


In the usual case I think there will be only one floating source in the
SRC_URI so we should try and handle that nicely.

> > > In order to not have to go to the network for each
> > > revision_identifier lookup I initially just cached it in the
> > > bb.data object that is always passed around. Didn't work. So I then
> > > tried to cache it in the bb.fetch module (e.g. like the already
> > > existing urldata). That did work, somewhat, but left me wondering
> > > for hours why it sometimes just did not work. With no indication to
> > > the problem whatsoever. I then found out that you sneaked an os.fork
> > > () into bb.runqueue which of course makes all caching in memory
> > > impossible.
> > 
> > "sneaked" isn't quite the right word, bitbake 1.8 was widely
> > advertised as multithreaded so it had to fork somewhere! It forks
> > even in the single thread case so people can't make bad
> > assumptions ;-).
> 
> Well, when I hear "multithreaded" I usually assume threads to be
> used and not processes :-)
> Though I know that there are some issues with Python threading (namely
> the Global Interpreter Lock), this should not pose a problem since
> bitbake doesn't do extensive calculations in Python code (in the childs
> at least).

The reason for C style processes rather than threads is that python will
only run on one processor and we wanted to take full advantage of
whatever hardware was available.

> > I'd be interested to know which cases this didn't work in. At parse
> > time bitbake is single threaded (and always will be effectively) so
> > if you calculate SRCREV at parse time, you should be able to save it.
> 
> Hmm, I can't pinpoint the exact necessary circumstances under which my
> code is called too late, but I think it just always happens when the
> normal cache is intact (e.g. not every .bb file is parsed on startup).
> I then clearly see lots of different results for os.getpid(), e.g. am
> always in a new forked child.

There are two cases where you'd get called:

1. You're parsed on startup for dependency handling

2. The .bb file is being run 


You'd need to save state somewhere in 1 for use in 2 which is the
problem you'd have hit. FWIW, we found it *much* cheaper to throw away
the unneeded data after 1 and reparse any files needed which is why we
reparse rather than cache the whole data dict.

> It's true that this might also be solved by cleaner integration into
> the dependency calculation stage.

I think its worth looking at (see below).

> > Ok, I think some kind of on disk caching will be needed since
> > different users are going to have different needs and hence different
> > "cache policies". Personally, I'd want to be able to manually reset
> > the cache when I wanted updates rather than having the system always
> > go for them. I think it would be better to have things in memory once
> > bitbake is running in a given session though so perhaps we can create
> > some kind of hybrid here with policy controlled by a variable?
> 
> Oh, that's a great idea and I entirely support it. There need to be
> some more changes though: All fetchers must create some consistent
> image of the current remote revision and then be modified to actually
> check out and use that revision. Then it should be no problem to write
> that current state to disk and reuse it at a later time.

That seems reasonable.

> > That workaround is truly hideous ;-). Perhaps we should add an
> > optional option to bb.fetch.init() which allows it to only run
> > against remote sources (marking the fetchers as local/remote). If
> > might be worth just adding a special version of the function...
> 
> Two things: 
> 
> One: I just thought about a better way to reference the revision
> information which virtually eliminates the problem (and might have some
> nice properties): Use Python's __getattr__. I would define a magical
> object (maybe "bb.fetch.rev" or something like that) with a __getattr__
> () method. Then you would use something like [EMAIL PROTECTED]
> instead of ${SRCREV_svn1}. 
>
> This way, code is being run when the revision information is requested
> and the code can do arbitrary things. For starters it could just return
> a fixed string when the information is not yet available ("" or "now").
> It could also assign some default refname for URLs that don't have a
> refname set.

I wondered about this but I don't like variables magically changing
value though as its asking for lots of trouble in the future. If we can
avoid doing that, I'd like to.

> The other thing: Yes, bb.fetch IMHO needs some attention. I entirely
> dislike the approach to keep data in the module global namespace. I
> also don't like the way fetcher objects are created/used. 

Yes, I dislike the fetcher code too. What you see is my partial attempt
to turn it into something more object oriented. I say partial as it
isn't finished, I found some pitfalls which need some thought to address
properly:

1. As mentioned above, we parse some files twice and need the objects
cached between runs for efficiency.
2. The parser makes calls into the fetcher in really nasty ways which
break attempts to turn things into neat objects.

Problem 1 is very similar to the problem of caching revisions so perhaps
we can find a way to address both...

Cheers,

Richard



_______________________________________________
Bitbake-dev mailing list
[email protected]
https://lists.berlios.de/mailman/listinfo/bitbake-dev

Re: [Bitbake-dev] Implementing support for the remote fetching of source control revision information

Reply via email to