On Thu, Aug 26, 2010 at 5:49 AM, Richard Purdie <[email protected]> wrote:

> Chris Larson has some great code that can parse variables, python
> functions and shell functions and track dependencies between them. I
> love the code but I don't think it could be directly integrated into
> bitbake in that form so I've been trying to work out what we can do,
> short term for now but also keeping in mind the long term.
>
> I've put some patches together in Poky which functionally should give a
> similar result but integrates into the existing DataSmart code and has a
> better object lifecycle. I've talked a little with Chris offline about
> things but I think this is ready for public discussion so I'm now going
> to go into details about how this is looking.
>

Thanks for putting this together.  I hadn't found the time to look closer at
a short term solution, so appreciate your taking the time to do this.

Since the checksums need access to the full data dict, they have to be
> generated either at parse time or at task execution time. If we choose
> task execution time, we can't use the checksums to help decide if the
> task should run or not. I'm therefore aiming to do this at parse time.
>

Agreed, parse time makes the most sense.

I enabled this in my test branch and the good news is I get lists of
> dependencies that look approximately correct. There are some issues such
> as:
>
> * dependencies due to broken metadata (but the parsing is correct)
> * missing dependencies for some cases the parser isn't reaching
> * missing manual dependencies for the cases we can never hope to find
>  by parsing
>

See
http://github.com/kergoth/openembedded/commit/06852c2388c9c5615128af80da958c76690b96b1
for
a start at adding some of the manual bits.  My varref branch on my bitbake
repo implemented a check by monkeypatching DataSmart -- if a task tried to
use a variable that wasn't captured in the signature, it raised an
exception, so I could spot the missing ones.  The thing is, without a check
like that, there's no good way to spot when the signature isn't complete.
 So we may want to resurrect something like that, possibly done via the
metadata, or conditionally enabled with a variable.

One key change I'm considering is allowing the metadata to mark empty
> "nullop" tasks such as fetchall. We could then short out the dependency
> code for them. Currently for example fetchall is a shell task that could
> possibly depend on anything exported into the environment.
>

This sounds like a good idea.  Actually, I used to have a patch to bitbake
which made it automatically consider tasks whose getVar returned an empty
string (after .strip()) as no-ops, and wouldn't bother running them.  At the
time it was of limited usefulness, but something like that makes a lot more
sense when we're more closely tracking what needs to happen and why.


> Bitbake has also picked up the habit of injecting functions into the
> methodpool but looking the string representations and function names of
> those functions. I've some local changes adding these back to the
> dictonary so their dependencies can be analysed.
>

I have a patch that does that too, just a tweak to the ast to set it in a
variable, you can actually include the function signature in a flag -- I'd
been thinking for a while that eventually we may want methodpool functions
and python functions to merge, the only difference today is that the
methodpool functions are callable directly, but we can do that for both, and
that they have a signature controlled by the metadata.  Something to think
about, anyway.

The bad news is that parsing took a factor of 10 times longer :(.
> Profiling reveals its the python function parsing that is taking the
> majority of the time. As an idea of numbers:
>
> Parsing 85 recipes went from taking 4 seconds to 40 seconds. That
> parsing involved parsing 8500 python functions and in doing so, the main
> time was spent looping over 1.4 million tokens that those python
> functions split into.
>
> By swapping NodeVisitor for ast.walk, I was able to get 10% speed back
> but the main drain is the 1.4 million tokens which we have to iterate
> over. I think it should be possible to put a cache around the python
> parsing code and reduce the number of real parsing calls so I will look
> into that next.


Yikes.  A cache of some sort sounds like a good idea.

Hmm. First thought would be to suggest prepopulating that information at
ConfigParsed time, for variables that aren't changed by the recipe itself,
but I don't know how that would interact with the COW implementation.
 Another possibility might be to capture the classes which get inherited by
a given recipe in our cache, and leverage that to pre-populate the
information for those, only if the recipe doesn't change them later on --
the advantage being the cache would be in the server process, so the
children would be able to get to it as a starting point.

Hmmm.
-- 
Christopher Larson
clarson at kergoth dot com
Founder - BitBake, OpenEmbedded, OpenZaurus
Maintainer - Tslib
Senior Software Engineer, Mentor Graphics
_______________________________________________
Bitbake-dev mailing list
[email protected]
https://lists.berlios.de/mailman/listinfo/bitbake-dev

Reply via email to