On Thursday, 4 August 2016 at 11:35:41 UTC, crimaniak wrote:
On Tuesday, 2 August 2016 at 22:06:38 UTC, Mark "J" Twain wrote:
Instead, a better solution would be to use variables:

if (n*length > m*capacity) expand(l*length)

Some time ago I played with self-optimizing cache layer.
Problem: Time to obtain cache items is unknown and server dependant. For example, network can be involved in this. Sometimes is it localhost, sometimes server on another continent. Time to cache it is unknown too. For example if memcached extension is not installed on this server then fallback to disk backend will slow cache performance very much. So we don't know is it worst to cache.
Solution: cache measures own performance and acts accordingly.


I did not think of this in terms of networking. I suppose it can be used there too but, as you mention, latency could be a factor.

So I make next things:
1. Functional interface instead of save/load type interface. cacheLevel.get(functor_to_get_item) allow to measure item obtaining time. 2. Implement all measuring/controlling logic in separate class with interface AbstractStatist, in CacheLevel class make just hooks for it. So changing AbstractStatist implementation I can change work mode to measure statistics and use it or work as usual cache (EmptyStatist for all empty hooks). I make implementation to skip all items not worst caching (calcTime/calcHits*totalHits-totalTime < 0) but it's possible to make more smart things (like caching only most efficient items not exceeding cache size).

If it is more complex, that what I described, it would have to be thought out a great deal. My goal was to simply have the program expose optimization points(variables) then allow an optimizer to change those to find better points. The program itself would be virtually unmodified. No code to interact with the optimization process except to use variables instead of constants(which is minimal and necessary).

Exposing an interface for the program itself to guide the optimization process seems like a lot more work. But, of course, ultimately is better as it allows more information to flow in to the optimization process. But this design is beyond what I'm willing to achieve(this way could be months or years to get done right), while my method could take just a few hours to code up, and is rather general, although a bit dumb(fire and forget and hope for the best).

In my experience I can draw some conclusions.
1. It need to separate measure mode and control mode. You can't have accurate statistics while changing system behavior according to current statistics state. 2. Statistics can be different for different applications but for specific application in specific conditions for most cases it can be approximated as constant.


Yes, it is tricky to make the algorithm stable. This is why I think, for a simple optimizer, it would need to do this over long periods(months of program use). Because there are so many aberrations(other programs, user behavior, etc), these can only be statically removed by repetitive uses. Essentially "low pass the data to remove all the spikes" then compare the avg result with the previous.


So for array allocating strategy more realistic scenario the next, I think: 1. Application compiled in some 'array_debug' mode then some statist trait added to array, collect usage statistics and writes optimal constants at the application exit. 2. Programmer configure array allocator in application according to these constants. 3. Application builds in release mode with optimal allocation strategy and without any statist overhead and works fast. Users are happy.

This is more static profiling type of optimizations. I am talking about something a bit different. Both methods could be used together for a better result, but mine is for simplicity. We generally blindly set constants for things that affect performance. Let's simply turn those constants in to variables and let a global blind optimizer try to figure out better values than what we "blindly" set. There is no guarantee that it would find a better result and may even introduce program instability. But all this stuff can be somewhat measured by cpu and memory usage and given enough parameters, there is probably at least several optimal points the optimizer could find.

Ultimately for just a little work(setting the variables and specifying their ranges and step, say), we could have most programs being created, in D for now at least, optimizing themselves(while they are being used by the user after they have been shipped) to some degree. This, I believe, is unheard of. It represents the next level in program optimizations.

Imagine one day where a program could optimize itself depending on the hardware of the user, the users habits, etc. Well, this method attempts to get that ball rolling and does all those things in a general way(albeit ignorant, but maybe just as effective). It's hard to know how effective until it is done, but we do know it can't be any worse than what we already have.






Reply via email to