Dne čtvrtek 23 Srpen 2012 14:36:09 Uwe Ohse napsal(a):
> On Sat, Aug 18, 2012 at 10:27:28PM +0200, Vladimir Nadvornik wrote:
> > http://www.mail-archive.com/geeqie-devel@lists.sourceforge.net/msg00356.h
> > tml
> 
> 1. i'd rather leave out the timestamp in the directory table.
>    One still has to search the sub directories anyway. This information
>    does not help at all.

IT works for added or deleted files. For modified files it works as long as 
the editor does not modify the original, but writes new file and renames it.
So it is not 100% reliable, but it could help in specific use cases.


> 
> 2. The metadata table: Is it for METADATA_PLAIN or METADATA_FORMATTED data?
>    If the former: may i suggest a second table for formatted data? I'd
> rather not search for all the possible different methods to express
> F/5.6...
> 
I thought that METADATA_PLAIN would be enough, but maybe you are right...


> 3. I got a few thousand RAW files where exif_get_metadata (likely the lower
>    level functions to the same thing, too) returns an array for things
> which should just be a single value. (Exif.Image.BitsPerSample)
>    To have some kind of meaningfull primary key i included an index field
>    into the table (can't use "file_id,key,value" as the value was 8 for all
>    three channels).
>    This also handles duplicate keywords (Xmp.dc.subject) quite fine.

OK.

> 
> 4. Right now the thing works as follows:
>     - read some directory
>     - recursive process subdirectories, come back.
> 
>     - start transaction
>     - process the files in the directory
>     - care for deleted file and subdirectories.
>     - commit transaction.
> 
>     I use transactions as sqlite is dead slow if you don't (limited to ~60
>     single database changes per second), and the asyncronous mode, not
>     using fsync() and friends, just leads to corrupt databases if someone
>     uses ^C.
> 
>     The downside of this approach is that the database is locked during the
>     transaction, which in a directory with 1000 images might take 2 minutes
>     or more.
> 
>     I could change to use one transaction per file. This still has
> reasonable performance compared to the approach without transaction (but
> see below for "reasonable") and wouldn't block the database for so long,
> but still it would block for seconds.
> 
> 5. If absolutely nothing has changed, my functions need 5.x seconds to read
>    all the directories in ~/Bilder with it's 63000 images in 1900
> directories. (4 core Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz, 8 GB, slow
> drive.)
> 
> 6. Importing 15000 images takes about 40 minutes (with two metadata tables
> at the moment).
>    As most of the time is spent inside sqlite and exif_get_metadata i don't
>    see any way to speed this up by much.
>    This is a bit long, as it right now is done at startup, before any user
>    input may be done.
> 
> 
> 7. Perfomance: 50 .JPG, 50 .ORF, into an already existing large database:
> 
>    - per directory transaction
>    -- first import: 0:16 minutes:seconds
>    -- re-import:    0:19 minutes:seconds
> 
>    - per file transaction
>    -- first import: 2:24 minutes:seconds
>    -- re-import:    2:40 minutes:seconds
> 
>    To me that means that the per directory transaction is the right thing
>    to do, unless one needs to be able to react to user input in that time.
> 
> 8. Performance Part 2:
>     - deleting the metadata of 1000 files takes at *long* time (about 40
>       seconds). Don't know why.
> 


I think that for normal operation, the timestamps can be compared per 
directory, at the end of filelist_read_real() function. The filelist there 
contains up-to date content of the directory, so you need just one 
corresponding SQL query and compare the filestamps.
Changed files then can be handled in idle callback (see g_idle_add()), one 
file per callback, one transaction per callback.
This will ensure that the database is up-to-date for all files in working 
directory, which is enough for sorting etc.

For search it might be enough to use filelist_recursive() which would triger 
the above and then wait until all files are updated. Alternatively you could 
add special mode with one transaction per directory.



> 9. I tried the following to update the status display in the main window
> (which is visible at that time), but it didn't work.
> 
>       LayoutWindow *lw=NULL;
>       if (!layout_valid(&lw))
>           return;
>       if (lw && !lw->info_status)
>           return;
>       /* we reach this point! */
>       layout_status_update_info(lw,path);
>       layout_status_update_progress(lw,val,"dir");
>       layout_util_status_update_write(lw);
> 
>     Note that i do not make any other GTK calls somewhere.
>     What else do i need to do?

GUI is redrawn in idle callback, so the update has no effect until the caller 
functions ends. 


> 
> 9. This might be a good time to define an interface for the rest of
>    geeqie. Any ideas?
> 
>     /* to be used for files and directories */
>     metadatadb_remove(const char *);
>     metadatadb_add(const char *); /* update, too */
>     metadatadb_rename(const char *from, const char *to);
> 
>     GList *metadatadb_get(const char *fname, const char *key);
>     GList *metadatadb_get_filedata(FileData *fd, const char *key);
>       /* = { return metadatadb_get(fd->original_path,key);  ? */
> 

I think that he rest of geeqie can use it via the existing metadata function 
metadata_read_list() - if the file entry in DB is up-to-date, then use it 
instead of exif.

Writes can be handled via notification function, after the metadata are 
written to file, see file_data_register_notify_func()

Then we need just one function that returns list of files for given query,
to be used in search dialog.


> 10. Must i assume that the legacy metadata stuff is still in use somewhere?
> 

The functions in 9. will thake care of it.



> 11. Right now i throw about 150 different exif / xmp tags into the
> database. This is fine for testing... but possibly somethat much
> otherwise. Is there a list of tags needed by or useful for geeqie?
>     Shall this list be made configurable? If so: how?
> 

We have to cache only the tags that are used for searching or sorting. These 
tags are already hardcoded in the code, so I think we can hardcode this
list too.

Vladimir


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Geeqie-devel mailing list
Geeqie-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geeqie-devel

Reply via email to