Cleaning up the Linux kernel's 'Dependency Hell': This developer is proposing 
2,200 commit changes

His fast-headers tree modifies over half of all kernel source files, and offers 
a +50-80% improvement in absolute kernel build performance.

Cleaning off decades of code mess isn't for the faint of heart, but leading 
Linux kernel developer Ingo Molnar is giving it the old college try in the 
open-source Linux kernel.

Written by Steven Vaughan-Nichols, Senior Contributing Editor. Posted in Linux 
and Open Source on January 4, 2022
https://www.zdnet.com/article/cleaning-up-the-linux-kernels-dependency-hell-this-developer-is-proposing-2200-commit-changes/


Last year, Linux's source code came to a whopping 27.8 million lines of code. 
It's only gotten bigger since then.

Like any 30-year old software project, Linux has picked up its fair share of 
cruft over the years. Now, after months of work, senior Linux kernel developer 
Ingo Molnar is releasing his first stab at cleaning it up at a fundamental 
level with his "Fast Kernel Headers" project.

The object? No less than a comprehensive clean-up and rework of the Linux 
kernel's header hierarchy and header dependencies.

Linux contains many header, .h, files. To be exact there are about 10,000 main 
.h headers in the Linux kernel with the include/ and arch/*/include/ 
hierarchies.

As Molnar explained, "Over the last 30+ years they have grown into a 
complicated & painful set of cross-dependencies we are affectionately calling 
'Dependency Hell'."

To bring rhyme and reason to all this, Molnar is proposing to make 2,200 commit 
changes to the code. That's a lot of commits! Why so many? Well, Molnar 
continued, it turns out there's a lot more mess in all that code than he 
thought there was when he started his clean-up project in late 2020. To be 
exact:

When I started this project, late 2020, I expected there to be maybe 50-100 
patches. I did a few crude measurements that suggested that about 20% build 
speed improvement could be gained by reducing header dependencies, without 
having a substantial runtime effect on the kernel. Seemed substantial enough to 
justify 50-100 commits.

- But as the number of patches increased, I saw only limited performance 
increases. By mid-2021 I got to over 500 commits in this tree and had to throw 
away my second attempt (!), the first two approaches simply didn't scale, 
weren't maintainable and barely offered a 4% build speedup, not worth the churn 
of 500 patches and not worth even announcing.

- With the third attempt I introduced the per_task() machinery which brought 
the necessary flexibility to reduce dependencies drastically, and it was a 
type-clean approach that improved maintainability. But even at 1,000 commits I 
barely got to a 10% build speed improvement. Again this was not something I 
felt comfortable pushing upstream or even announcing. :-/

- But the numbers were pretty clear: 20% performance gains were very much 
possible. So I kept developing this tree, and most of the speedups started 
arriving after over 1,500 commits, in the fall of 2021. I was very surprised 
when it went beyond 20% speedup and more than arrived at the current 78% with 
my reference config.

There's a clear super-linear improvement property of kernel build overhead, 
once the number of dependencies is reduced to the bare minimum.

So, today, his cleaned-up "fast-headers tree offers a +50-80% improvement in 
absolute kernel build performance on supported architectures, depending on the 
config. This is a major step forward in terms of Linux kernel build efficiency 
& performance."

A 50 to 80% improvement is well worth the time and trouble. These speed savings 
come from reducing the size of the default headers, which with the fast-headers 
tree will mostly include type definitions, by 1-2 orders of magnitude.

But, wait, those 2,200 commits are only the tip of the iceberg. Those changes 
will affect almost every program in the Linux kernel.

All together, Molnar estimates that "in addition to the aforementioned 25 
sub-trees and 2,200 commits, the fast-headers tree modifies over half of all 
kernel source files in existence."

It's going to change 25,288 files with 178,024 insertions and 74,720 deletions.

In other words, "Yeah, so this is probably the largest single feature 
announcement in LKML's [Linux Kernel Mailing List] history. Not by choice! :-/"

On top of this, making these changes doable will require aggressive decoupling 
of high-level headers; type and API header decoupling; automated dependency 
addition to .h and .c files; and optimizing headers. This will not be easy.

So, before pulling the trigger and starting to make these changes Molnar is 
gathering feedback from his fellow maintainers and, in particular, he'd loved 
to hear from "Linus [Torvalds] & Andrew [Morton] and the other maintainers of 
the biggest subsystems affected by these changes."

Greg Kroah-Hartman, the Linux kernel maintainer for the Linux stable branch, 
thinks "This is 'interesting,' but how are you going to keep the 
kernel/sched/per_task_area_struct_defs.h and struct task_struct_per_task 
definition in sync?" In short, who gets to bell the cat of maintaining all 
these changes?

Molnar replied that he's willing to tackle this job and that he doesn't think 
it will be that much trouble.

Kroah-Hatman then gave Molnar's efforts his blessings and remarked, "I'll leave 
all of this up to the scheduler developers, but it still looks odd to me. The 
mess we create trying to work around issues in C :)"

He's not wrong. This is one reason why there are efforts afoot to make Rust 
Linux's second language.

If adopted, users won't see any real changes. But Linux kernel and distro 
developers will be able to compile Linux faster than ever. The result will be 
to make it easier and quicker than ever to make improvements, patches, and add 
features to Linux.
_______________________________________________
Link mailing list
[email protected]
https://mailman.anu.edu.au/mailman/listinfo/link

Reply via email to