Hi,

We've talked about checksums/signatures to represent the metadata since
forever. I'll try and give an update on where this currently stands.

Background: Why would they be useful?

Currently its very hard to know if two builds, X and Y are the same or
different in some way. If we knew they were the same, we might be able
to take shortcuts by using prebuilt objects for example. We would also
like to know if an existing build was valid or whether we needed to
rebuild it.

Background: What do we need to generate one?

The key information is all the "inputs" that go into a given build. In
bitbake, this equates to the state of the data dictionary. We could just
checksum the entire dictionary but that means one change would
invalidate every checksum, even if a task wasn't affected by the change.
We can do better.

We therefore need a list of dependencies between variables.

State of play:

Chris Larson has some great code that can parse variables, python
functions and shell functions and track dependencies between them. I
love the code but I don't think it could be directly integrated into
bitbake in that form so I've been trying to work out what we can do,
short term for now but also keeping in mind the long term.

I've put some patches together in Poky which functionally should give a
similar result but integrates into the existing DataSmart code and has a
better object lifecycle. I've talked a little with Chris offline about
things but I think this is ready for public discussion so I'm now going
to go into details about how this is looking.

Since the checksums need access to the full data dict, they have to be
generated either at parse time or at task execution time. If we choose
task execution time, we can't use the checksums to help decide if the
task should run or not. I'm therefore aiming to do this at parse time.

I enabled this in my test branch and the good news is I get lists of
dependencies that look approximately correct. There are some issues such
as:

* dependencies due to broken metadata (but the parsing is correct)
* missing dependencies for some cases the parser isn't reaching
* missing manual dependencies for the cases we can never hope to find 
  by parsing

but on the whole the results look good and the above issues are things
that can be fixed.

One key change I'm considering is allowing the metadata to mark empty
"nullop" tasks such as fetchall. We could then short out the dependency
code for them. Currently for example fetchall is a shell task that could
possibly depend on anything exported into the environment.

Bitbake has also picked up the habit of injecting functions into the
methodpool but looking the string representations and function names of
those functions. I've some local changes adding these back to the
dictonary so their dependencies can be analysed.

The bad news is that parsing took a factor of 10 times longer :(.
Profiling reveals its the python function parsing that is taking the
majority of the time. As an idea of numbers:

Parsing 85 recipes went from taking 4 seconds to 40 seconds. That
parsing involved parsing 8500 python functions and in doing so, the main
time was spent looping over 1.4 million tokens that those python
functions split into.

By swapping NodeVisitor for ast.walk, I was able to get 10% speed back
but the main drain is the 1.4 million tokens which we have to iterate
over. I think it should be possible to put a cache around the python
parsing code and reduce the number of real parsing calls so I will look
into that next.

Cheers,

Richard












_______________________________________________
Bitbake-dev mailing list
[email protected]
https://lists.berlios.de/mailman/listinfo/bitbake-dev

Reply via email to