On Sun, Jun 17, 2012 at 02:15:31PM +0200, Niels Thykier wrote: > I noticed [1] and decided to check what made Lintian a "lengthy > invariant". Processing the changes (and related files) took about a > minute (accoriding to the shell built-in time). Running: > > $ time lintian -d -C manpages allegro4-doc_4.4.2-2_all.deb > > takes about 40 seconds. > > The bottleneck appears to our calls to "man" in checks/manpages. > Manually running man on all the manpages takes roughly 30 seconds. As > far as I can tell, man is "just slow" (at least with currently > selected options).
A good deal of this is just death-by-a-thousand-cuts rather than any single thing being desperately slow; it's not unreasonably slow for interactive use, but it's being run 823 times here, and it has to spawn a lot of subprocesses because the full warnings check necessarily involves invoking nroff, which isn't lightweight. I've never attempted to optimise the manpages check before, though, and so there's some scope for easy improvements: each subprocess is expensive when you multiply them up, so let's look at which ones are obviously unnecessary. (I can't get any accurate timings just now because my backups are running.) Setting MANROFFSEQ to empty in the environment would get rid of a call to tbl for most pages; this would mean that lintian is stricter about pages declaring their preprocessors with '\" lines (i.e. pages that need tbl would have to say '\" t at the top), but as long as we document this in the info text for the relevant check I would say that a bit of extra strictness is perfectly acceptable in the context of lintian, certainly if it comes with a performance advantage. Adding the '-Tutf8 -Z' options to man would cause it to only run pages through the parsing half of the groff pipeline, and not bother with formatting them for display using grotty or processing the output through col. On the lintian side, it would be worth taking some steps to avoid running commands using the shell (e.g. the list forms of open and exec with some manual redirections). Each one doesn't take very long but they add up. Also, we might as well use 'gzip -cd' directly rather than running through the zcat wrapper script every time. How far does all this get you? Given the current timings, I'd have thought that even fractional improvements would be worthwhile. > Running man in a collection is unlikely to yield any noticable > improvement[2]. Even with xargs we are looking at at least 25 seconds > plus man is unhelpful in this case[3]. [...] > [3] It emits errors when running with xargs that do not occur when > running them in serial. Can you give me an example yielding such a difference? > The error messages all use "<standard input>" rather than a filename, > so it will be... difficult to relate them to the original manpage. Indeed. This is really groff being unhelpful, not man; convincing groff to output a more useful file name would appear to require man to write out a temporary file, which wouldn't be terribly clever for I/O. I suppose we could have man postprocess groff's error messages, or write out a status line at the start of processing each file so that lintian could know what "<standard input>" following that line means, or something like that. -- Colin Watson [[email protected]] -- To UNSUBSCRIBE, email to [email protected] with a subject of "unsubscribe". Trouble? Contact [email protected] Archive: http://lists.debian.org/[email protected]

