From: "Bill Stoddard" <[EMAIL PROTECTED]>
Sent: Thursday, August 16, 2001 3:33 PM
> > > My issue is that table merges are a o*n problem, while hash merges are an o*n2
>problem :(
> >
> > Hash merge is O(mn) in the worst case (the case when your hash function completely
>sucks
> > :-). The merge will tend to O(mlogn).
That varies based on the number of directory merges, as well, and how large the base
table is.
> Okay, I've thought about this some. Your veto of my patch is unreasonable.
Hmmm? Perhaps, but I committed effectively the same code an hour ago, because I'm not
going
to defend my veto, nor contribute any further to this clueless discussion. If my
discussion
points on optimizing weren't clear, then yell. Someone rewrite them who is clearer
than I.
But let's talk to the underlying argument that is: "When is a dir_merge( base ) not
const?"
I'll let the numbers decide if we continue to use tables or hashes without this
optimization.
> We should commit the simpified hash table patch I poster earlier. The hash
>implementation
> will clearly outperform tables during lookups. Hash merges are O(mlogn) and one of
>the
> hash tables is typically small so my conclusion is that the merge operation will
>probably
> not contribute significant overhead (as compared to a table merge), especially in the
> cases where merges occur often (like when we open .htaccess files, where the
>overhead of
> file io dominates the transaction).
YOU MISS THE POINT!
We are merging biglonglist with nicesmalladdition...
Because we merge, biglonglist must be recreated, and refilled one element at a time,
expanding
(potentially several times) as it goes. Doesn't matter how simple the
nicesmalladdition is,
this is a huge performance pig. One very _simple_ optimization in apr_hash would be
an
apr_hash_dup function that would really memcpy() the existing hash table (into the
same sized
allocation), it may or may not be that simple. Someone can try a patch.
Reiterating; the lhs hash isn't just memcpy'ed. It's built again from scratch.
> The simplfied hash patch is better than tables, IMHO, so it should go in.
And since only real numbers will tell us anything, I've committed the safety patches to
mod_mime.c to duplicate not only the hash table whenever we munge it, but also copy
the exinfo
members whenever we munge one of them.
A very _simple_ optimization would test that _this_dir_merge_ created the exinfo, so
we don't
copy it three times if we perform three different merge/remove operations on the same
record.
Again, anyone can feel free to offer a patch.
> I am not opposed to better implementations but we should not introduce wicked
>complexity
> for a bit extra performance, especially without benchmarking.
The hash is wickedly complex. If I had the energy to recreate my patch, it adds a
small
handful of lines. But agreed that benchmarking is the question. mod_mime merges are
now safe,
and using hashes. Benchmarking will prove it up or down. Either which way, I don't
have any
more energy for this, it's sapped a good week of my life, as it stands, on an issue
(table -> hash) that I had 0 interest in :(
Bill