Hi, We've talked about checksums/signatures to represent the metadata since forever. I'll try and give an update on where this currently stands.
Background: Why would they be useful? Currently its very hard to know if two builds, X and Y are the same or different in some way. If we knew they were the same, we might be able to take shortcuts by using prebuilt objects for example. We would also like to know if an existing build was valid or whether we needed to rebuild it. Background: What do we need to generate one? The key information is all the "inputs" that go into a given build. In bitbake, this equates to the state of the data dictionary. We could just checksum the entire dictionary but that means one change would invalidate every checksum, even if a task wasn't affected by the change. We can do better. We therefore need a list of dependencies between variables. State of play: Chris Larson has some great code that can parse variables, python functions and shell functions and track dependencies between them. I love the code but I don't think it could be directly integrated into bitbake in that form so I've been trying to work out what we can do, short term for now but also keeping in mind the long term. I've put some patches together in Poky which functionally should give a similar result but integrates into the existing DataSmart code and has a better object lifecycle. I've talked a little with Chris offline about things but I think this is ready for public discussion so I'm now going to go into details about how this is looking. Since the checksums need access to the full data dict, they have to be generated either at parse time or at task execution time. If we choose task execution time, we can't use the checksums to help decide if the task should run or not. I'm therefore aiming to do this at parse time. I enabled this in my test branch and the good news is I get lists of dependencies that look approximately correct. There are some issues such as: * dependencies due to broken metadata (but the parsing is correct) * missing dependencies for some cases the parser isn't reaching * missing manual dependencies for the cases we can never hope to find by parsing but on the whole the results look good and the above issues are things that can be fixed. One key change I'm considering is allowing the metadata to mark empty "nullop" tasks such as fetchall. We could then short out the dependency code for them. Currently for example fetchall is a shell task that could possibly depend on anything exported into the environment. Bitbake has also picked up the habit of injecting functions into the methodpool but looking the string representations and function names of those functions. I've some local changes adding these back to the dictonary so their dependencies can be analysed. The bad news is that parsing took a factor of 10 times longer :(. Profiling reveals its the python function parsing that is taking the majority of the time. As an idea of numbers: Parsing 85 recipes went from taking 4 seconds to 40 seconds. That parsing involved parsing 8500 python functions and in doing so, the main time was spent looping over 1.4 million tokens that those python functions split into. By swapping NodeVisitor for ast.walk, I was able to get 10% speed back but the main drain is the 1.4 million tokens which we have to iterate over. I think it should be possible to put a cache around the python parsing code and reduce the number of real parsing calls so I will look into that next. Cheers, Richard _______________________________________________ Bitbake-dev mailing list [email protected] https://lists.berlios.de/mailman/listinfo/bitbake-dev
