On Thu, Apr 3, 2014 at 3:41 PM, Pedro Côrte-Real <pe...@pedrocr.net> wrote:
> On Wed, Apr 2, 2014 at 9:08 PM, Pedro Côrte-Real <pe...@pedrocr.net> wrote:
>> Having read through the code in more detail here's a possible
>> suggestion on how to do the minimum possible thing that may just work:
>>
>> Leave the DT_MIPMAP_F and DT_MIPMAP_FULL levels just as they are.
>> For levels DT_MIPMAP_0 through DT_MIPMAP_3:
>> 1) whenever an image is about to be removed from the cache write it
>> out to disk before
>> 2) whenever you have a cache miss try to see if the image is on disk
>> before recreating it from the original image
>> 3) whenever an image gets changed remove it from the disk
>> 4) potentially change the sizes so that DT_MIPMAP_F can be a large
>> size and yet the thumbnail levels be smaller (say 800x600 or lower)

Went ahead and built this:

https://github.com/pedrocr/darktable/tree/diskcache

> Here are the results with thumbnails being calculated from half-size raws
>
> 10 images - 7.117s cold, 0.505s hot
> 100 images - 74.056s cold, 1.257s hot
> 1000 images - 1439.562s cold, 1446.595s hot

Here are the same results now. It's on the same machine, DT_MIPMAP_1
also with 256 slots but at a sightly lower resolution (as I forgot I
had capped the size of DT_MIPMAP_3 to 640x480 vs the 800x600 of the
previous test).

10 images - 8.319s cold, 0.833s hot
100 images - 85.701s cold, 1.83s hot
1000 images - 1597.838s cold, 8.637 hot, 1.366 hotter (a third run)

So now the cache stays helpful even with a large number of images.
We're paying a penalty for that in the cold case by spending the time
to store these on disk. The "hotter" case is on a third run when the
first two runs have saved all 1000 images to disk. The first run only
saves the first 744 and has the other 256 loaded in the memory cache
that is serialized to disk on exit. On the second run those 256 are
loaded from disk, and then slowly evicted and written to disk as we
load back the other 744 from disk. On the third run all 1000 are on
disk so no writing is needed.

The gist of it is we're paying something like 10-15% of overhead in
the cold case to get up to a 1100x speedup in the hot case. This setup
has a USB2 attached disk so the overhead is potentially overestimated
as disk writing is slower than it should be. On the other hand the
second test is not completely comparable because of my 640x480 vs
800x600 screwup.

Would love some feedback on the code. Particularly I'm not sure the
locking in dt_cache_read_get is still correct around the calls to
dt_cache_filebacked_tryget. Since I'm changing the bucket data maybe I
need a dt_cache_bucket_write_lock? Or is the fact that at that point
we're still setting up the bucket enough that we don't need a write
lock?

Brian, if you want to test this the github branch should compile
cleanly and probably won't eat your data. I wouldn't trust it without
good backups though. I've only done very minimal testing. I haven't
hooked up the cache invalidation to this either so if you change the
size settings it's probably wise to do a "rm -fr ~/.cache/darktable/*"
before restarting darktable.

Cheers,

Pedro

------------------------------------------------------------------------------
Put Bad Developers to Shame
Dominate Development with Jenkins Continuous Integration
Continuously Automate Build, Test & Deployment 
Start a new project now. Try Jenkins in the cloud.
http://p.sf.net/sfu/13600_Cloudbees
_______________________________________________
darktable-devel mailing list
darktable-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/darktable-devel

Reply via email to