Steve Underwood wrote: > Matthew Fredrickson wrote: >> Actually, with the way caching is done on nearly all modern processors, >> it is debatable whether or not a look up table is the optimal way to do >> the conversion, at least on such a simple codec such as ulaw or alaw. >> In fact, the amount of time it takes to fetch memory from a cache miss >> can easily ruin the single element lookup performance in a look up >> table. And if you have large tables (such as in the linear to ulaw or >> alaw table), the tradeoff of having to service a cache miss versus a few >> cached instructions executing a native CPU clock speed makes it almost a >> no brainer (IMHO). >> >> You'll pay a cache miss on the first time your run the routine, but the >> instructions running the routine will take up much less CPU cache space >> than the look up tables, increasing the likelihood of them being evicted >> (whereas the lookup table, taking up a lot more space, has a much better >> chance of causing a cache miss whenever you access). >> >> Obviously, if you're running on a CPU with no cache, a look up table is >> a good way to do it. I'm just saying that very few processors that are >> running Asterisk are running it on processors without processor caches. >> >> Matthew Fredrickson >> Digium, Inc. >> > In spandsp I do the G.711 conversions algorithmically. Most modern > processors have a "where is the top 1" instruction, and that reduces the > calculations to something very fast. When I first did this it was a lot > slower than a lookup if I tested it on its own, but faster in a real > workload where the cache was working hard. That was in the days of 256k > caches, though. Now the latest Intels have 12M the picture may be > different. That 12M is L3 cache, which is a lot slower than the small L1 > cache, but I suspect it make mean the lookup approach is as good as > calculation with any workload.
That's a pretty good point too. A lot of this is speculation until an actual workload is put through the mix. I would suspect though that you're more likely to be faster on a larger range of processors in use at the moment (the bulk my guess wouldn't have 12 MB L3 caches) with the algorithmic approach, like you mentioned. And if it's just a few instructions, it quite possibly could be faster than a combined L1 and L2 cache miss (IMHO :-) ). Matthew Fredrickson Digium, Inc. _______________________________________________ -- Bandwidth and Colocation Provided by http://www.api-digital.com -- asterisk-users mailing list To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-users
