On Mon, Mar 02, 2009 at 11:15:53PM -0800, Joshua ben Jore wrote:
> On Mon, Mar 2, 2009 at 12:22 PM, Nicholas Clark <n...@ccl4.org> wrote:
> > On Mon, Mar 02, 2009 at 10:23:38AM -0800, Bill Ward wrote:
> >
> >> Personally I always use hashes for objects.  Hashes are pretty fast in 
> >> Perl,
> >> especially when there aren't many keys, so I don't think the benefits of
> >> using arrays are worth it.  The risk of typos is pretty small, and the
> >
> > Hash lookup should be O(1), independent of number of keys. Of course, a hash
> > with more keys uses more memory, but so does an array with more elements.
> 
> I once found some very fast code varying in something I'm guessing was
> O(n) on the length of the keys. I've occasionally wished I could get
> static lookups to compile with the hashed I32 already stashed.

There is code to do this in the peephole optimiser. For those who don't know,
shared hash key scalars store the precomputed U32 hash value. For
illustration, I'm going to use pre 5.10, as 5.8.x and earlier store them in
PVIVs, which makes them visibly distinct from regular PVs in dump output.

The code (in blead) to convert constant method names to shared hash keys is
in Perl_ck_method: http://perl5.git.perl.org/perl.git/blob/HEAD:/op.c#l7455

The code to convert hash lookups (or at least some of them) is
in Perl_peep: http://perl5.git.perl.org/perl.git/blob/HEAD:/op.c#l8568

However, something in ithreads, I know not what, undoes this one. So, for
an unthreaded 5.8.8, notice that "rules" is a PVIV, so shared:

$ ./perl -Ilib -MO=Concise -e '$perl->{rules}'
8  <@> leave[1 ref] vKP/REFC ->(end)
1     <0> enter ->2
2     <;> nextstate(main 1 -e:1) v ->3
7     <2> helem vK/2 ->8
5        <1> rv2hv[t1] sKR/1 ->6
4           <1> rv2sv sKM/DREFHV,1 ->5
3              <$> gv(*perl) s ->4
6        <$> const(PVIV "rules") s/BARE ->7
-e syntax OK

Whereas the threaded 5.8.8 loses this optimisation at some point later:

$ perl -MO=Concise -e '$perl->{rules}'
8  <@> leave[1 ref] vKP/REFC ->(end)
1     <0> enter ->2
2     <;> nextstate(main 1 -e:1) v ->3
7     <2> helem vK/2 ->8
5        <1> rv2hv[t2] sKR/1 ->6
4           <1> rv2sv sKM/DREFHV,1 ->5
3              <#> gv[*perl] s ->4
6        <$> const[PV "rules"] s/BARE ->7
-e syntax OK


If you have time to identify and fix that, that would be great. Method names
don't seem to suffer from this:

$ ./perl -Ilib -MO=Concise -e '$perl->rules()'
7  <@> leave[1 ref] vKP/REFC ->(end)
1     <0> enter ->2
2     <;> nextstate(main 1 -e:1) v ->3
6     <1> entersub[t1] vKS/TARG ->7
3        <0> pushmark s ->4
-        <1> ex-rv2sv sKM/1 ->5
4           <$> gvsv(*perl) s ->5
5        <$> method_named(PVIV "rules") ->6
-e syntax OK
$ perl -MO=Concise -e '$perl->rules()'
7  <@> leave[1 ref] vKP/REFC ->(end)
1     <0> enter ->2
2     <;> nextstate(main 1 -e:1) v ->3
6     <1> entersub[t2] vKS/TARG ->7
3        <0> pushmark s ->4
-        <1> ex-rv2sv sKM/1 ->5
4           <#> gvsv[*perl] s ->5
5        <$> method_named[PVIV "rules"] ->6
-e syntax OK


However, longer term, I'm wondering why we even do this in the peephole
optimiser, given that, worst case, we could allocate *all* bare words are
shared, straight out. (And possibly even allocate all strings from the
tokeniser as shared, given that they can now be copied as COW, and my hunch
is that strings in the tokeniser more likely than not occur more than once).

Nicholas Clark

Reply via email to