On Tue, Jan 31, 2012 at 09:10:25PM +0100, Michael Niedermayer wrote:
> On Tue, Jan 31, 2012 at 01:10:28PM +0100, Otto Moerbeek wrote:
> > On Mon, Jan 30, 2012 at 03:18:04PM +0100, Michael Niedermayer wrote:
> >
> > > Hi
> > >
> > > Our automated tests for FFmpeg (http://fate.ffmpeg.org/) have yesterday
> > > found a segfault in openbsds /bin/sh.
> > >
> > > Analysis revealed that it was triggered by too many variables
> > > That is something like
> > >
> > > #!/bin/sh
> > >
> > > i=0
> > > while true ; do
> > > eval X${i}=yes
> > > i=$(($i+1))
> > > test $i -gt 17000 && break
> > > done
> > >
> > > will segfault.
> > >
> > > The following patch fixes it: (which we are currently using on our
> > > openbsd fate client)
> > >
> > > Index: table.h
> > > ===================================================================
> > > RCS file: /cvs/src/bin/ksh/table.h,v
> > > retrieving revision 1.7
> > > diff -u -r1.7 table.h
> > > --- table.h 11 Dec 2005 20:31:21 -0000 1.7
> > > +++ table.h 30 Jan 2012 14:13:30 -0000
> > > @@ -8,7 +8,7 @@
> > >
> > > struct table {
> > > Area *areap; /* area to allocate entries */
> > > - short size, nfree; /* hash size (always 2^^n), free entries
> > > */
> > > + int size, nfree; /* hash size (always 2^^n), free entries
> > > */
> > > struct tbl **tbls; /* hashed table items */
> > > };
> > >
> > >
> >
> > Thanks for the report.
> >
> > I could reproduce the bug, but I'm not 100% happy with your fix since
> > it just pushes the failure to a bigger number of vars.
> >
> > It's better to check for the case like below,
>
> yes, absolutely a check is needed.
> doing the check at SHORT_MAX/2 though would likely just lead ffmpegs
> configure to fail with "too many vars"
But what does the test actually tries to find out? Is this just a test
creating many vars for no real reason or is this something real and not
contrived?
> Some further investigation shows that code like:
> i=0
> while true ; do
> eval X="\$X${i}"
>
> i=$(($i+1))
> test $i -gt 17000 && break
> done
>
> also fails, iam not sure why ksh generates entries in the table for
> this
Will look into this.
>
> another (minor) issue ive noticed is that the hashes used are
> strongly biased toward the last chars due to
> h = 2*h + *n++;
> in hash()
> which shifts the first chars to the MSBs and then out while only the
> LSB are used as index
>
> as many of our variables look like
> bar_decoder_deps
> foobar_decoder_deps
> bar_decoder_checked
> foobar_decoder_checked
> ...
> the 15 LSB of the hash() tend to fall only in few buckets making
> the hash table somewhat slow
>
> i suspect using (bernsteins hash)
> h = 33*h + *n++;
> would work alot better for almost all cases
> but ive not investigated or tested this much and there are many hash
> functions that have better random distributions but are alot slower
> i just picked bernsteins as its closest to what is used currently
> and should be pretty much always better IMHO.
Yeah, well, that is not something for now (we're close to lock before
5.1 release).
-Otto