I was finally able to do a full import of my photo sets into
darktable. I'm importing a bunch of JPG/MRW/ARW files some with
accompanying xmp files. I have about 25k files. What I noticed during
the import process was that the time to import a photo grew linearly
so the total time to import N photos grows quadratically. Here were my
observations:

1000 photos: 56s - 0.056 seconds per image
2000 photos: 2m12s - 0.076 seconds per image
3000 photos: 3m35s - 0.083 seconds per image
4000 photos: 5m6s - 0.091 seconds per image
5000 photos: 6m42s - 0.096 seconds per image
6000 photos: 8m30s - 0.108 seconds per image
7700 photos: 13m3s - 0.161 seconds per image
9000 photos: 16m56s - 0.179 seconds per image
10100 photos: 19m54s - 0.162 seconds per image
11600 photos: 23m52s - 0.159 seconds per image
12700 photos: 27m7s - 0.177 seconds per image
13700 photos: 30m4s - 0.177 seconds per image
15000 photos: 34m11s - 0.19 seconds per image
16000 photos: 37m34s - 0.203 seconds per image
17000 photos: 41m36s - 0.242 seconds per image
18500 photos: 48m57s - 0.294 seconds per image
20000 photos: 55m52s - 0.277 seconds per image
21000 photos: 1h42s - 0.29 seconds per image
22100 photos: 1h6m10s - 0.298 seconds per image
23000 photos: 1h10m32s - 0.291 seconds per image
24000 photos: 1h16m52s - 0.38 seconds per image
24500 photos: 1h19m22s - 0.3 seconds per image

A linear regression shows that the time per image is 0.045+N*1.14e-5
(.94 r^2), growing linearly with the number of images already imported
(N).

This was on a Lenovo X220t running Ubuntu 12.04 inside a Virtualbox
VM. Host is Windows 7. Darktable was constantly at 100% CPU
(single-threaded, not using the second CPU) while darktable's IO
struggled to get above 1MB/s while the disk everything was on can do
30MB/s (USB 2.0 attached, tested with hdparm).

Probably something in the import process is doing a pass over all the
already imported images for every new image that shows up. This is
incredibly inefficient. As an exercise here's how long it will take on
my machine to import larger collections:

25000 (mine): 1.3 hours
50000 (2x): 4.6 hours
150000 (filling 2TB HD): 37.8 hours
300000 (filling 4TB HD): 147.3 hours (6.1 days)

As far as I can tell this only grows by the size of the import job. If
you do 2 25000 imports it will only take 1.3x2 hours not 4.6. If that
is indeed the case something is wrong in the import process, probably
because the normal use case for it is small 100-200 image rolls at a
time.

Cheers,

Pedro

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Darktable-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/darktable-users

Reply via email to