Package: dpkg-dev
Version: 1.15.4.1
Severity: normal
File: /usr/bin/dpkg-source
Tags: patch
As seen in (but unrelated to) bug #554612, running dpkg-source -b is slow on
big trees, where the main contender is Dpkg::Source::Patch::add_diff_file.
%Time Sec. #calls sec/call F name
32.52 69.8110 44702 0.001562 Dpkg::Source::Patch::add_diff_file
29.95 64.2837 44829 0.001434 Dpkg::IPC::fork_and_exec
10.22 21.9370 44829 0.000489 Dpkg::IPC::wait_child
7.14 15.3309 97591 0.000157 File::Spec::Unix::abs2rel
4.25 9.1206 585681 0.000016 File::Spec::Unix::canonpath
Here, the main problem is obviously forking 44829 times diff -u, while the
vast majority of files in the orig tarball haven't been touched (which is
mostly true on all packages).
The attached patch (which may have style and correctness issue) implements
a very simple check in perl (so, without a fork) to see if files differ
before running diff. The result is stunning:
>From 3 minutes and 30 seconds on iceape in format 3.0 (quilt), dpkg-source
-b goes down to 35 seconds (where 15 are spent bunzipping).
This is where the time is spent, now:
%Time Sec. #calls sec/call F name
24.41 14.1649 128 0.110663 Dpkg::IPC::wait_child
19.46 11.2948 97594 0.000116 File::Spec::Unix::abs2rel
13.77 7.9901 44703 0.000179 Dpkg::Source::Patch::add_diff_file
9.37 5.4382 585699 0.000009 File::Spec::Unix::canonpath
It looks much better.
My gut feeling is that it should improve run time on most if not all packages.
(and more importantly, will help big packages maintainers)
I'm pretty sure reading by blocks of a multiple of 4k instead of reading line
by line could be faster, too.
-- System Information:
Debian Release: squeeze/sid
APT prefers unstable
APT policy: (500, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)
Kernel: Linux 2.6.31-1-amd64 (SMP w/2 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash
Versions of packages dpkg-dev depends on:
ii binutils 2.20-2 The GNU assembler, linker and bina
ii bzip2 1.0.5-3 high-quality block-sorting file co
ii dpkg 1.15.4.1 Debian package management system
ii libtimedate-perl 1.1900-1 Time and date functions for Perl
ii lzma 4.43-14 Compression method of 7z format in
ii make 3.81-7 An utility for Directing compilati
ii patch 2.5.9-5 Apply a diff file to an original
ii perl [perl5] 5.10.1-6 Larry Wall's Practical Extraction
ii perl-modules 5.10.1-6 Core Perl modules
Versions of packages dpkg-dev recommends:
ii build-essential 11.4 Informational list of build-essent
ii fakeroot 1.14.3 Gives a fake root environment
ii gcc [c-compiler] 4:4.3.3-9+nmu1 The GNU C compiler
ii gcc-4.1 [c-compiler] 4.1.2-27 The GNU C compiler
ii gcc-4.2 [c-compiler] 4.2.4-6 The GNU C compiler
ii gcc-4.3 [c-compiler] 4.3.4-6 The GNU C compiler
ii gcc-4.4 [c-compiler] 4.4.2-2 The GNU C compiler
ii gnupg 1.4.10-2 GNU privacy guard - a free PGP rep
ii gpgv 1.4.10-2 GNU privacy guard - signature veri
Versions of packages dpkg-dev suggests:
ii debian-keyring [debian-mainta 2009.08.27 GnuPG (and obsolete PGP) keys of D
-- no debconf information
-- debsums errors found:
debsums: changed file /usr/share/perl5/Dpkg/Source/Patch.pm (from dpkg-dev
package)
debsums: changed file /usr/share/perl5/Dpkg/Source/Package/V2.pm (from dpkg-dev
package)
--- /usr/share/perl5/Dpkg/Source/Patch.pm
+++ /usr/share/perl5/Dpkg/Source/Patch.pm
@@ -58,6 +58,19 @@
sub add_diff_file {
my ($self, $old, $new, %opts) = @_;
+ open(OLD, "<", $old);
+ open(NEW, "<", $new);
+ my $match = 1;
+ while (<OLD>) {
+ if ($_ ne <NEW>) {
+ $match = 0;
+ last;
+ }
+ }
+ close OLD;
+ close NEW;
+ return 1 if ($match);
+
$opts{"include_timestamp"} = 0 unless exists $opts{"include_timestamp"};
my $handle_binary = $opts{"handle_binary_func"} || sub {
my ($self, $old, $new) = @_;