Hi!

On Sun, 2012-12-16 at 18:47:03 +0100, Guillem Jover wrote:
> On Sun, 2012-12-16 at 17:49:46 +0100, Niels Thykier wrote:
> > Package: libdpkg-perl
> > Version: 1.16.9
> > Severity: minor
> 
> > Trying to use Dpkg::Control on a "Packages_amd64" file[1], I got a bit
> > annoyed with its performance.  I believe the problem is basically that
> > Dpkg::Control(::Hash) uses its tied hash when parsing and thereby
> > "wasting" a lot time.
> 
> Right, this seems a bit too much. I'm working right now on optimizing
> the parsing code w/o sacrificing the general interface.

Just a quick reply, the two obvious fixes I implemented locally were
to move the capitalization to the output functions, and to change the
parser to bypass the tied hash. Then, as mentioned in the other bug
report, in addition adding support for the user to ignore ordering and
always using direct hash speeds up things a bit. Here are some numbers
(although I might have messed up the tests at the time, I'll run those
again once I'm cleaning this up before pushing):


,-- master:
dpkg-break-tie: 88 wallclock secs (87.68 usr +  0.12 sys = 87.80 CPU) @  0.06/s 
(n=5)
dpkg-reuse-obj: 29 wallclock secs (29.67 usr +  0.06 sys = 29.73 CPU) @  0.17/s 
(n=5)
  dpkg-std: 88 wallclock secs (87.67 usr +  0.05 sys = 87.72 CPU) @  0.06/s 
(n=5)
               s/iter dpkg-break-tie       dpkg-std dpkg-reuse-obj
dpkg-break-tie   17.6             --            -0%           -66%
dpkg-std         17.5             0%             --           -66%
dpkg-reuse-obj   5.95           195%           195%             --
`---

,-- after commit 'Move field capitalization to output functions':
dpkg-break-tie: 41 wallclock secs (40.81 usr +  0.08 sys = 40.89 CPU) @  0.12/s 
(n=5)
dpkg-reuse-obj: 30 wallclock secs (29.55 usr +  0.12 sys = 29.67 CPU) @  0.17/s 
(n=5)
  dpkg-std: 74 wallclock secs (74.44 usr +  0.07 sys = 74.51 CPU) @  0.07/s 
(n=5)
               s/iter       dpkg-std dpkg-break-tie dpkg-reuse-obj
dpkg-std         14.9             --           -45%           -60%
dpkg-break-tie   8.18            82%             --           -27%
dpkg-reuse-obj   5.93           151%            38%             --
`---

,-- after commit 'Optimize parsing bypassing the tied hash':
  dpkg-std: 37 wallclock secs (36.89 usr +  0.12 sys = 37.01 CPU) @  0.14/s 
(n=5)
         s/iter dpkg-std
dpkg-std   7.40       --
`---

,-- after commit 'Do not preserve insertion order' (w/ preserve_order = 0):
Benchmark: timing 5 iterations of dpkg-std...
  dpkg-std: 34 wallclock secs (33.98 usr +  0.08 sys = 34.06 CPU) @  0.15/s 
(n=5)
         s/iter dpkg-std
dpkg-std   6.81       --
`---

,-- after commit 'Do not preserve insertion order' (w/ direct_hash = 1):
  dpkg-std: 33 wallclock secs (32.93 usr +  0.08 sys = 33.01 CPU) @  0.15/s 
(n=5)
         s/iter dpkg-std
dpkg-std   6.60       --
`---


,-- after all optimizations (w/ preserve_order = 1):
dpkg-break-tie: 38 wallclock secs (37.30 usr +  0.08 sys = 37.38 CPU) @  0.13/s 
(n=5)
dpkg-reuse-obj: 27 wallclock secs (26.85 usr +  0.15 sys = 27.00 CPU) @  0.19/s 
(n=5)
  dpkg-std: 37 wallclock secs (37.32 usr +  0.10 sys = 37.42 CPU) @  0.13/s 
(n=5)
               s/iter       dpkg-std dpkg-break-tie dpkg-reuse-obj
dpkg-std         7.48             --            -0%           -28%
dpkg-break-tie   7.48             0%             --           -28%
dpkg-reuse-obj   5.40            39%            38%             --
`---

,-- after all optimizations (w/ preserve_order = 0 -> no order tracking):
dpkg-break-tie: 35 wallclock secs (34.62 usr +  0.06 sys = 34.68 CPU) @  0.14/s 
(n=5)
dpkg-reuse-obj: 23 wallclock secs (22.95 usr +  0.07 sys = 23.02 CPU) @  0.22/s 
(n=5)
  dpkg-std: 35 wallclock secs (34.42 usr +  0.09 sys = 34.51 CPU) @  0.14/s 
(n=5)
               s/iter dpkg-break-tie       dpkg-std dpkg-reuse-obj
dpkg-break-tie   6.94             --            -0%           -34%
dpkg-std         6.90             0%             --           -33%
dpkg-reuse-obj   4.60            51%            50%             --
`---

,-- after all optimizations (w/ direct_hash = 1 -> no Hash::Tie):
dpkg-break-tie: 33 wallclock secs (32.84 usr +  0.07 sys = 32.91 CPU) @  0.15/s 
(n=5)
dpkg-reuse-obj: 23 wallclock secs (22.71 usr +  0.05 sys = 22.76 CPU) @  0.22/s 
(n=5)
  dpkg-std: 32 wallclock secs (32.75 usr +  0.06 sys = 32.81 CPU) @  0.15/s 
(n=5)
               s/iter dpkg-break-tie       dpkg-std dpkg-reuse-obj
dpkg-break-tie   6.58             --            -0%           -31%
dpkg-std         6.56             0%             --           -31%
dpkg-reuse-obj   4.55            45%            44%             --
`---


Thanks,
Guillem


-- 
To UNSUBSCRIBE, email to [email protected]
with a subject of "unsubscribe". Trouble? Contact [email protected]

Reply via email to