On Wed, Jan 29, 2014 at 1:54 AM, jan i <j...@apache.org> wrote: > On 29 January 2014 10:18, Andre Fischer <awf....@gmail.com> wrote: > > > I would like to report some observations that I made when thinking about > > how to make building OpenOffice with one global makefile feasible. It > will > > probably the last of build related mails in the near future. > > > > Traditional make uses a top-down approach. It starts with a target, > 'all' > > by default, and looks at its dependencies. When one of those has to be > > made or is newer then the target then the target also has to be made. > This > > is done recursively and depth-first. Every file on which 'all' has a > > direct or indirect dependency has to be checked. If we would build > > OpenOffice with one makefile (included makefiles don't count) then that > are > > a lot of files to check. There are about 9800 cxx files, 3500 c files, > > 12000 hxx files, and lot of external headers. Checking the modification > > time of so many files is one of the reasons for the delay in , say, sw/ > > between starting make and its first action. > > > > As I don't have all global dependencies in a format that would allow > > experimation, I tried how long it would take to get the mtime (time of > last > > modification) for all files, source, generated, compiled, linked, about > > 120000. I wrote a small Java program for that. With a warm cache that > > takes about 23s. When run in 4 threads this reduced to less than 8s. > > Could be worse. > > > > But it also could be better because in general there are only a few files > > modified, usually files that you modified yourself in an editor. There > is > > another approach, introduced, as far as I know, by the tup [1] build > tool, > > that is bottom up. If you had something similar to the oracle of > > complexity theory, that gave you the list of modified files since the > last > > build, you could find the depending files much faster. Faster for two > > reasons. Firstly, there is only one path in the dependency tree up > towards > > the root (while there are many down from the root). Only targets on this > > path are affected by the modified file. Secondly, the dependency analysis > > is comparatively cheap. The expensive part is to determine the file > > modification times. If they where miraculously given then even the > > top-down approach would not take noticably longer. > > > > So I made another experiment to see if such an oracle can be created. > > Java 7 has the java.nio.file.WatchService that lets you monitor file > > modfifications in one directory. I registered it to all directories in > our > > source tree (some 16000 directories). With the WatchService in place > every > > file modification can be recorded and stored for later. On the next > build > > you only have to check the set of modified files, not all files. > > Registering the directory watchers takes a couple of seconds but after > > that it does not cause any noticeable CPU activity. Any file > modifications > > are reported almost at once. I do not have the framework in place to > start > > a build with this information but I would expect it to be as fast as > > compiling the modified files and linking takes. > > > > The tup website references a paper [2] in which the established top-down > > approaches are called alpha alogithms and the new bottom-up approach is > > called beta algorithm. Tup has implemented a file modification watcher > (in > > C or C++) only for Linux. On Windows it just scans all files (for which > it > > needs a little more time than my Java program, maybe it does not use more > > than one thread). > > > > > > This is something that we should keep in mind for when we ever should get > > a build solution with global dependencies and this build tool would turn > > out to be too slow. > > > > > > If can find the source code of my Java experiments at [3]. If nothing > else > > you can see an application of the ForkJoinPool that allowed my to write > the > > parallel file system scan in just a few lines. There is also an > > alternative implementation that uses the ExecutorService (with a fixed > > thread pool) which needs a few more lines of code. And there is of > course > > the use of the WatchService. > > > > It's really interesting to read these observations and test cases... we have a large and complicated source tree and just seeing what can be observed about it is fascinating to me.
Thanks for writing down your observations which I find highly interesting. > I hope your stop on writing about build does not include giving your > opinion on my ideas in the future as well. > > For the record the capstone project, and my little hobby project "Build > R.I.P." follow a third idea: > > We have a clear seperation of module build and central (total) build. +1 I would certainly go for this. A while back someone asked about ye olde "ld" approach -- all modules compiled/built and then linked later down the road. If we could somehow do something to get back to that idea in a more friendly modern way, it would certainly make working on specific areas more feasible. > The > module makefile knows how to build the module, and the central makefile > knows the relation between modules. > > The makefile in each module touched a file, and the central makefile only > controls that file. > > But youir idea of watching for changes is very interesting. > > rgds > jan I. > > > > Andre > > > > > > [1] http://gittup.org/tup/ > > [2] http://gittup.org/tup/build_system_rules_and_algorithms.pdf > > [3] http://people.apache.org/~af/test.zip > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org > > For additional commands, e-mail: dev-h...@openoffice.apache.org > > > > > -- ------------------------------------------------------------------------------------------------- MzK "Cats do not have to be shown how to have a good time, for they are unfailing ingenious in that respect." -- James Mason