I'll implement the following for Leo 4.8, probably first in the until-4-7-final branch. There is no hurry to do this--the present caching scheme is quite good as it is. It should *not* be done for Leo 4.7: the trunk will contain only bug fixes until Leo 4.7 final goes out the door.
At the end of the "Improved caching for rc1?" thread I said: QQQ It's ironic that all the recent caching work has added essentially nothing to Leo's caching capabilities. However, I am quite pleased with the work, for several reasons QQQ I neglected the most important reason. In the process of immersing myself in the lowest-level details of the code, I primed my subconscious to think expansively about the problem. This is an example of what I call "contraction followed by expansion" thinking. It is only after getting stuck that one can get unstuck. Sitting in the bath last night, I considered what the present scheme does, and how it could be improved. Here is a revision of the notes I made after the bath. Terminology: **top-level folder** are direct subfolders of .leo/db. Top-level folders represent file *locations* not file contents. The names of top-level folders have the form x_y, where x is the the short file name and y is a hashlib key corresponding to the full path to the file. Exception: the top-level "globals" folder represents g.app.db. This contains minor data. At present, top-level folders contain various subdirectories. Details don't matter, because we can dispense with them all. This is the substance of the new design. In the new design, a top-level folder will contain only two files: contents_<key>: the contents of the file. Call this the **contents** file. data_<key>: a dict representing the "minor data" of the file: <globals> element stuff, expansion bits, etc. Call this the **data** file. Here <key> is the hashlib key (returned by cacher.fileKey) of the entire contents of the file. The top-level folder will contain cached data only for the latest version of a file. If Leo should somehow try to load an older version of cached file, the cacher class will reload the entire file, as it should. But this will seldom if ever happen. For any top-level directory, and for any particular <key>, Leo will only ever write the contents file once. The proof is immediate. The <key> depends on the entire contents of the file. Otoh, Leo (that is, the cacher), can write data_<key> as many times desired. The Aha: this is perfectly safe. The data in the data file can never get out-of-sync with the contents of the contents file because the <key> would change. Rather than writing "minor" data to a plethora of directories and files, the cacher will write a single dict containing all minor data to the data file. It's as simple as that: the cacher class can easily "queue" all data for writing. That's it. Imo, there are no down sides to the new scheme. The up sides: - It will be easier for humans to understand the contents of the cache and to understand file modification dates. - This scheme will simplify or even eliminate the complex path- manipulation code in PickleShareDB. The cacher will only ever create top-level directories. - At present, the clear-all-caches command is wimpy. In the new scheme it can *safely* clear all top-level directories. - The cacher can safely use g.makeAllNonExistentDirectories to make top-level directories. This can be unit tested safely as well. Edward -- You received this message because you are subscribed to the Google Groups "leo-editor" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/leo-editor?hl=en.
