On Jan 14, 2008, at 2:14 PM, Christiaan Hofman wrote:

> (continued from user list)
>
> I think we could do the following when loading a cached index:
>
> - Iterate the pubs in the document to get all linked file URLs
> - Iterate the SKDocuments in the index to get all the indexed URLs
> (if there was a cached index)
> - Compare the two sets. Remove and add SKDocuments for URLs as needed
>
> Apart from the first part this can all be done on a secondary thread.
>
> Note that what we index is basically the text of the URLs. The
> relation to the pubs can be recreated each time separately, that does
> not need to be persistent. When we search we also just follow that
> relation as the last (and easy) step.

That sounds like it would work.

> I'm not even sure if we need the time stamp for the .bib file when we
> do it this way, as we update the URLs anyway. I think it would even
> work with a partially finished index that was cached.

I'd trash it and start over again unless it's absolutely certain to be  
in a consistent state (i.e. properly flushed and closed when the  
document closed or app quit).  Search Kit is very unforgiving.

> The only fragility AFAICS is when user replace the file at a linked
> file URL. That is hard to fix. The only way I can think of is to
> cache time stamps for every linked file with the index. Though that
> may be slow.

Checking a time stamp would be trivial compared to indexing, but I  
don't think a time stamp is sufficient.  The best we can do is  
probably store the sha1 hash of each file and check it again each  
time; for typical file sizes (<100 MB) sha1 is reasonably fast in  
Terminal.  We have to concentrate on making it bulletproof before  
making it fast, anyway.

Some other things to keep in mind: at present indexes can have  
obsolete files because files are only removed if their owning BibItem  
is deleted.  IIRC it's possible for the same file to be added multiple  
times because of this, since it gets a new URL after being autofiled.   
We can't necessarily remove files when they're renamed because we  
don't know if some other pub still has a link to that file.  Mappings  
like that (multiple items->file) aren't correctly supported right now,  
and the last pub to add a file gets associated with it.  I use file  
content search frequently, but I've never bothered trying to fix this  
because it doesn't affect my work.  Pruning at startup would probably  
be sufficient to take care of the renaming issue.

-- 
adam

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bibdesk-develop mailing list
Bibdesk-develop@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bibdesk-develop

Reply via email to