Hello, thanks to everyone who replied. Here are some conclusions of mine:
Today's filebased-cache code seems to be suffering from the same problems it was suffering 7 years ago. Every time you .set() the cache it asks the OS to provide a list of files, just for counting them (for the purpose of culling). This is slow. The culling strategy is to delete a random sample of cache entries. So Russell's comment seems valid today, at least with respect to culling. Of Django's included cache backends, apparently only memcached is suitable for a large cache in production. Redis could be a good idea for adding persistence, but it is non-standard (not included with Django). Redis is anyway not appropriate for my use case because I don't need the speed, so storing the information in RAM, which has a larger cost than the filesystem, is suboptimal. The fact that a cache knows how to get the information if it doesn't have it is an interesting observation that I hadn't thought about, but appears to be true for most uses of "cache" that I can think of (it doesn't apply to write caches). Therefore I'm using the cache for a different purpose than the one for which it was designed, which can create all sorts of problems (such as a new administrator—or even an old one—not knowing or forgetting they can't just delete the cache). However I will take my risks and continue using it for a while, as for these two small projects implementing a more complicated solution, or adding another component and thus raising the bar for other people to replace me, isn't worth it. Antonis Christofides http://djangodeployment.com On 2017-05-27 12:25, Antonis Christofides wrote: > > Hello all, > > I have an application that calculates and tells you whether a specific crop at > a specific piece of land needs to be irrigated, and how much. The calculation > lasts for a few seconds, so I'm doing it offline with Celery. Every two hours > new meteorological data comes in and all the pieces of land are recalculated. > > The question is where to store the results of the calculation. I thought that > since they are re-creatable, the cache would be the appropriate place. > However, there is a difference with the more common use of the cache: they are > re-creatable, but they are also necessary. You can't just go and delete any > item in the cache. This will cripple the website, which expects to find the > calculation results in the cache. Viewing something on the site will never > trigger a recalculation (and if I make it trigger, it will be a safety > procedure for edge cases and not the normal way of doing things). The results > must also survive reboots, so I chose the file-based cache. > > I didn't know about culling, so when the pieces of land grew to 100, and the > items in the cache to 400 (4 items need to be stored for each piece of land), > I spent a few hours trying to find out what the heck is going on. I solved the > problem by tweaking the culling parameters. However all this has raised a few > issues: > > 1. The filesystem cache can't grow too much because of issue 11260 > <https://code.djangoproject.com/ticket/11260>, which is marked wontfix. > According to Russell Keith-Magee > <https://code.djangoproject.com/ticket/11260#comment:7>, > > "the filesystem cache is intended as an easy way to test caching, not > as a serious caching strategy. The default cache size and the cull > strategy implemented by the file cache should make that obvious. If > you need a cache capable of holding 100000 items, I strongly recommend > you look at memcache. If you insist on using the filesystem as a > cache, it isn't hard to subclass and extend the existing cache." > > If these comments are correct, then the documentation needs some fixing, > because not only does in not say that the filesystem cache is not for > serious use, but it implies the opposite: > > "Without a really compelling reason, ... you should stick to the cache > backends included with Django. They’ve been well-tested and are easy > to use." > > Is Russell not entirely correct perhaps, or is the documentation? Or am I > missing something? > > 2. In the end, is it a bad idea to use the cache for this particular case? I > also have a similar use case in an unrelated app: a page that needs about > a minute to render. Although I've implemented a quick-and-dirty solution > of increasing the web server's timeout and caching the page, I guess the > correct way would be to produce that page offline with Celery or so. Where > would I store such a page if not in the cache? > > -- > Antonis Christofides > http://djangodeployment.com > -- > You received this message because you are subscribed to the Google Groups > "Django users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] > <mailto:[email protected]>. > To post to this group, send email to [email protected] > <mailto:[email protected]>. > Visit this group at https://groups.google.com/group/django-users. > To view this discussion on the web visit > https://groups.google.com/d/msgid/django-users/a5a8d1ab-f4e0-a6b5-b1da-acc9dc2dbf9d%40djangodeployment.com > <https://groups.google.com/d/msgid/django-users/a5a8d1ab-f4e0-a6b5-b1da-acc9dc2dbf9d%40djangodeployment.com?utm_medium=email&utm_source=footer>. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "Django users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/django-users. To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/ed2e1418-44ec-8a4b-d642-7b004d875325%40djangodeployment.com. For more options, visit https://groups.google.com/d/optout.

