On 08/23/10 12:08, Kirk Talman wrote:
Long lists accessed via a binary search at one time might cause paging
problems. At one time IBM recommended using a linear search over a binary
search to overcome paging difficulties. Today's experience is that paging
seldom occurs.
Today the problem is taking advantage of caching as a table increases in
size. The cost of fetching from memory as opposed from cache can wipe out
any gain from the binary search. With a cache line of 32 bytes, the cost
of doing a linear search of a table with entrylength=4 and entrycount=8 is
very low -- almost the same as doing the first compare in a binary search.
I'm astonished. But it depends on the size of the table. Once the size
of the part of the table examined in a typical search exceeds the size
of the cache, LRU will cause the entire cache to be flushed on a
typical search. How big is cache?
If the table is kept in a binary tree with a 4-byte key and two 4-byte
pointers (possibly relative to table base), the size for a million entries
is 12 MB, far smaller than DASD Bill's bitmap. And initializing
a 537 MB table is a brutal hit on the cache. When the table contains
a million entries, there is only a 0.25 probability that any 1024-bit
cache line will contain a 1-bit.
If the table is not kept as a binary tree, there may be considerable gain
by reordering so the position of each entry is its collating ordinal with
the order of the bits reversed. For example:
Key Index for table size = 555
0 0
1 512
2 256
3 768
4 128
5 640
6 384
7 896
8 64
etc.
If the table is aligned on a 128-byte cache line boundary, the first
five levels of a binary search will happen within a single cache
line, etc. If the table size exceeds cache size, only the leaves are
likely to flutter.
By dividing the list into parts and indexing into the parts by either a
fast computed hash as in my prior message or directly using part of the
"key", the original speed can be restored. This is an n-dimensional
analogue to a binary tree.
Yes. Divide the table into 32 parts at the first level; 32**n at each lower
level. Each array of pointers fits in a cache line. At the leaves, use
Bill's bitmap, but only allocate pointer or bitmap entries as they're
used. This cuts the size of the table to about 130 MB.
Is there a query at execution time to extract cache line size to
adjust the geometry of the pointer arrays adaptively?
Yukk! I'm designing an algorithm around the hardware. Shame
on me!
How big is cache, typically, nowadays? Will Hiperdispatch preserve
cache contents across context switches?
-- gil