On Thu, Aug 26, 2010 at 5:49 AM, Richard Purdie <[email protected]> wrote:
> Chris Larson has some great code that can parse variables, python > functions and shell functions and track dependencies between them. I > love the code but I don't think it could be directly integrated into > bitbake in that form so I've been trying to work out what we can do, > short term for now but also keeping in mind the long term. > > I've put some patches together in Poky which functionally should give a > similar result but integrates into the existing DataSmart code and has a > better object lifecycle. I've talked a little with Chris offline about > things but I think this is ready for public discussion so I'm now going > to go into details about how this is looking. > Thanks for putting this together. I hadn't found the time to look closer at a short term solution, so appreciate your taking the time to do this. Since the checksums need access to the full data dict, they have to be > generated either at parse time or at task execution time. If we choose > task execution time, we can't use the checksums to help decide if the > task should run or not. I'm therefore aiming to do this at parse time. > Agreed, parse time makes the most sense. I enabled this in my test branch and the good news is I get lists of > dependencies that look approximately correct. There are some issues such > as: > > * dependencies due to broken metadata (but the parsing is correct) > * missing dependencies for some cases the parser isn't reaching > * missing manual dependencies for the cases we can never hope to find > by parsing > See http://github.com/kergoth/openembedded/commit/06852c2388c9c5615128af80da958c76690b96b1 for a start at adding some of the manual bits. My varref branch on my bitbake repo implemented a check by monkeypatching DataSmart -- if a task tried to use a variable that wasn't captured in the signature, it raised an exception, so I could spot the missing ones. The thing is, without a check like that, there's no good way to spot when the signature isn't complete. So we may want to resurrect something like that, possibly done via the metadata, or conditionally enabled with a variable. One key change I'm considering is allowing the metadata to mark empty > "nullop" tasks such as fetchall. We could then short out the dependency > code for them. Currently for example fetchall is a shell task that could > possibly depend on anything exported into the environment. > This sounds like a good idea. Actually, I used to have a patch to bitbake which made it automatically consider tasks whose getVar returned an empty string (after .strip()) as no-ops, and wouldn't bother running them. At the time it was of limited usefulness, but something like that makes a lot more sense when we're more closely tracking what needs to happen and why. > Bitbake has also picked up the habit of injecting functions into the > methodpool but looking the string representations and function names of > those functions. I've some local changes adding these back to the > dictonary so their dependencies can be analysed. > I have a patch that does that too, just a tweak to the ast to set it in a variable, you can actually include the function signature in a flag -- I'd been thinking for a while that eventually we may want methodpool functions and python functions to merge, the only difference today is that the methodpool functions are callable directly, but we can do that for both, and that they have a signature controlled by the metadata. Something to think about, anyway. The bad news is that parsing took a factor of 10 times longer :(. > Profiling reveals its the python function parsing that is taking the > majority of the time. As an idea of numbers: > > Parsing 85 recipes went from taking 4 seconds to 40 seconds. That > parsing involved parsing 8500 python functions and in doing so, the main > time was spent looping over 1.4 million tokens that those python > functions split into. > > By swapping NodeVisitor for ast.walk, I was able to get 10% speed back > but the main drain is the 1.4 million tokens which we have to iterate > over. I think it should be possible to put a cache around the python > parsing code and reduce the number of real parsing calls so I will look > into that next. Yikes. A cache of some sort sounds like a good idea. Hmm. First thought would be to suggest prepopulating that information at ConfigParsed time, for variables that aren't changed by the recipe itself, but I don't know how that would interact with the COW implementation. Another possibility might be to capture the classes which get inherited by a given recipe in our cache, and leverage that to pre-populate the information for those, only if the recipe doesn't change them later on -- the advantage being the cache would be in the server process, so the children would be able to get to it as a starting point. Hmmm. -- Christopher Larson clarson at kergoth dot com Founder - BitBake, OpenEmbedded, OpenZaurus Maintainer - Tslib Senior Software Engineer, Mentor Graphics
_______________________________________________ Bitbake-dev mailing list [email protected] https://lists.berlios.de/mailman/listinfo/bitbake-dev
