On 12/13/05, Aaron Sherman <[EMAIL PROTECTED]> wrote:
> On Mon, 2005-12-12 at 16:44 -0500, Uri Guttman wrote:
> > >>>>> "DL" == Donald Leslie {74279} <[EMAIL PROTECTED]> writes:
> >
> >   DL> I have an apache/mod-perl application that can results in large
> >   DL> xml strings which are then transformed by xslt into html. A
> >   DL> database query can result in an xml string with a length greater
> >   DL> than 300,000 . In a normal perl allocation you can pre-extend the
> >   DL> string to prevent repeated new allocations and copies. Does anyone
> >   DL> know what happens in a mod-perl application? Does pre-extending
> >   DL> have any benefit?
> >
> > i can't tell you about any mod-perl issues but in general pre-extending
> > in perl doesn't gain you as much as you would think. the reason is that
> > some storage isn't really truely freed to the main pool when it gets
> > freed when its ref count goes to 0. perl will keep it around in
> > anticipation of it be reallocated for this same item in a future call or
> > loop iteration. so it is effectively doing the usual doubling its size
> > to grow into the first large string and then it is already prextended
> > the rest of the time via reusing the previous buffer.
>
> Well, just in one trial, it does look like the data gets moved with
> every substantial growth.
[...]

The key word is *substantial*.

I am too lazy to look at the source code for the factor for strings,
but Perl widely uses a strategy of always allocating a fixed factor
more space than has currently been requested for a wide variety of
data structures.  The result is that if you grow a data structure
incrementally, the sum of the costs of moving the data forms a
geometric series, which sums up to no more than a constant times the
final size of the data structure.  This constant is usually small
enough that it isn't worth the effort it would take to remove it.

If you really think that it *is* worth the effort that it would take
to remove that insignificant overhead, then I'm going to go out on a
limb and say that you have an application which shouldn't be written
in Perl.

OK, I won't look up the source, but I will demonstrate what I am
talking about.  It seems from this that strings are slightly different
than arrays, but the effect is similar:

#! /usr/bin/perl -l
$s = "";
my $old_loc;
for (1..10_000_000) {
  $s .= "1";
  my $loc = loc($s);
  if ($loc != $old_loc) {
    print "Len: " . length($s) . "; loc $loc";
    $old_loc = $loc;
  }
}

sub loc {
  unpack "I", pack "p", $_[0]
}
__END__
Len: 1; loc 135673368
Len: 12; loc 135709104
Len: 36; loc 135708936
Len: 52; loc 135698368
Len: 68; loc 135671296
Len: 172; loc 135702616
Len: 2060; loc 135713768
Len: 134156; loc 3083567112
Len: 135156; loc 3083427848
Len: 274420; loc 3083149320
Len: 552948; loc 3082592264
Len: 1110004; loc 3081478152
Len: 2224116; loc 3079249928
Len: 4452340; loc 3074793480
Len: 8908788; loc 3065880584

In case you're interested, that came out to 1.58 copies/character.  If
you tried to assign a long string to pre-extend, truncate, then
incrementally assign the one that you were really interested in, it
would come out to 2 copies/character, and you're losing in your
attempt to optimize!

Cheers,
Ben
 
_______________________________________________
Boston-pm mailing list
[email protected]
http://mail.pm.org/mailman/listinfo/boston-pm

Reply via email to