On Jan 14, 2008, at 2:14 PM, Christiaan Hofman wrote: > (continued from user list) > > I think we could do the following when loading a cached index: > > - Iterate the pubs in the document to get all linked file URLs > - Iterate the SKDocuments in the index to get all the indexed URLs > (if there was a cached index) > - Compare the two sets. Remove and add SKDocuments for URLs as needed > > Apart from the first part this can all be done on a secondary thread. > > Note that what we index is basically the text of the URLs. The > relation to the pubs can be recreated each time separately, that does > not need to be persistent. When we search we also just follow that > relation as the last (and easy) step.
That sounds like it would work. > I'm not even sure if we need the time stamp for the .bib file when we > do it this way, as we update the URLs anyway. I think it would even > work with a partially finished index that was cached. I'd trash it and start over again unless it's absolutely certain to be in a consistent state (i.e. properly flushed and closed when the document closed or app quit). Search Kit is very unforgiving. > The only fragility AFAICS is when user replace the file at a linked > file URL. That is hard to fix. The only way I can think of is to > cache time stamps for every linked file with the index. Though that > may be slow. Checking a time stamp would be trivial compared to indexing, but I don't think a time stamp is sufficient. The best we can do is probably store the sha1 hash of each file and check it again each time; for typical file sizes (<100 MB) sha1 is reasonably fast in Terminal. We have to concentrate on making it bulletproof before making it fast, anyway. Some other things to keep in mind: at present indexes can have obsolete files because files are only removed if their owning BibItem is deleted. IIRC it's possible for the same file to be added multiple times because of this, since it gets a new URL after being autofiled. We can't necessarily remove files when they're renamed because we don't know if some other pub still has a link to that file. Mappings like that (multiple items->file) aren't correctly supported right now, and the last pub to add a file gets associated with it. I use file content search frequently, but I've never bothered trying to fix this because it doesn't affect my work. Pruning at startup would probably be sufficient to take care of the renaming issue. -- adam ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Bibdesk-develop mailing list Bibdesk-develop@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bibdesk-develop