On Mon, Mar 02, 2009 at 11:15:53PM -0800, Joshua ben Jore wrote: > On Mon, Mar 2, 2009 at 12:22 PM, Nicholas Clark <n...@ccl4.org> wrote: > > On Mon, Mar 02, 2009 at 10:23:38AM -0800, Bill Ward wrote: > > > >> Personally I always use hashes for objects. Hashes are pretty fast in > >> Perl, > >> especially when there aren't many keys, so I don't think the benefits of > >> using arrays are worth it. The risk of typos is pretty small, and the > > > > Hash lookup should be O(1), independent of number of keys. Of course, a hash > > with more keys uses more memory, but so does an array with more elements. > > I once found some very fast code varying in something I'm guessing was > O(n) on the length of the keys. I've occasionally wished I could get > static lookups to compile with the hashed I32 already stashed.
There is code to do this in the peephole optimiser. For those who don't know, shared hash key scalars store the precomputed U32 hash value. For illustration, I'm going to use pre 5.10, as 5.8.x and earlier store them in PVIVs, which makes them visibly distinct from regular PVs in dump output. The code (in blead) to convert constant method names to shared hash keys is in Perl_ck_method: http://perl5.git.perl.org/perl.git/blob/HEAD:/op.c#l7455 The code to convert hash lookups (or at least some of them) is in Perl_peep: http://perl5.git.perl.org/perl.git/blob/HEAD:/op.c#l8568 However, something in ithreads, I know not what, undoes this one. So, for an unthreaded 5.8.8, notice that "rules" is a PVIV, so shared: $ ./perl -Ilib -MO=Concise -e '$perl->{rules}' 8 <@> leave[1 ref] vKP/REFC ->(end) 1 <0> enter ->2 2 <;> nextstate(main 1 -e:1) v ->3 7 <2> helem vK/2 ->8 5 <1> rv2hv[t1] sKR/1 ->6 4 <1> rv2sv sKM/DREFHV,1 ->5 3 <$> gv(*perl) s ->4 6 <$> const(PVIV "rules") s/BARE ->7 -e syntax OK Whereas the threaded 5.8.8 loses this optimisation at some point later: $ perl -MO=Concise -e '$perl->{rules}' 8 <@> leave[1 ref] vKP/REFC ->(end) 1 <0> enter ->2 2 <;> nextstate(main 1 -e:1) v ->3 7 <2> helem vK/2 ->8 5 <1> rv2hv[t2] sKR/1 ->6 4 <1> rv2sv sKM/DREFHV,1 ->5 3 <#> gv[*perl] s ->4 6 <$> const[PV "rules"] s/BARE ->7 -e syntax OK If you have time to identify and fix that, that would be great. Method names don't seem to suffer from this: $ ./perl -Ilib -MO=Concise -e '$perl->rules()' 7 <@> leave[1 ref] vKP/REFC ->(end) 1 <0> enter ->2 2 <;> nextstate(main 1 -e:1) v ->3 6 <1> entersub[t1] vKS/TARG ->7 3 <0> pushmark s ->4 - <1> ex-rv2sv sKM/1 ->5 4 <$> gvsv(*perl) s ->5 5 <$> method_named(PVIV "rules") ->6 -e syntax OK $ perl -MO=Concise -e '$perl->rules()' 7 <@> leave[1 ref] vKP/REFC ->(end) 1 <0> enter ->2 2 <;> nextstate(main 1 -e:1) v ->3 6 <1> entersub[t2] vKS/TARG ->7 3 <0> pushmark s ->4 - <1> ex-rv2sv sKM/1 ->5 4 <#> gvsv[*perl] s ->5 5 <$> method_named[PVIV "rules"] ->6 -e syntax OK However, longer term, I'm wondering why we even do this in the peephole optimiser, given that, worst case, we could allocate *all* bare words are shared, straight out. (And possibly even allocate all strings from the tokeniser as shared, given that they can now be copied as COW, and my hunch is that strings in the tokeniser more likely than not occur more than once). Nicholas Clark