On 15 Jan 2008, at 8:03 AM, Adam R. Maxwell wrote: > > On Jan 14, 2008, at 2:14 PM, Christiaan Hofman wrote: > >> (continued from user list) >> >> I think we could do the following when loading a cached index: >> >> - Iterate the pubs in the document to get all linked file URLs >> - Iterate the SKDocuments in the index to get all the indexed URLs >> (if there was a cached index) >> - Compare the two sets. Remove and add SKDocuments for URLs as needed >> >> Apart from the first part this can all be done on a secondary thread. >> >> Note that what we index is basically the text of the URLs. The >> relation to the pubs can be recreated each time separately, that does >> not need to be persistent. When we search we also just follow that >> relation as the last (and easy) step. > > That sounds like it would work. > >> I'm not even sure if we need the time stamp for the .bib file when we >> do it this way, as we update the URLs anyway. I think it would even >> work with a partially finished index that was cached. > > I'd trash it and start over again unless it's absolutely certain to be > in a consistent state (i.e. properly flushed and closed when the > document closed or app quit). Search Kit is very unforgiving. > >> The only fragility AFAICS is when user replace the file at a linked >> file URL. That is hard to fix. The only way I can think of is to >> cache time stamps for every linked file with the index. Though that >> may be slow. > > Checking a time stamp would be trivial compared to indexing, but I > don't think a time stamp is sufficient. The best we can do is > probably store the sha1 hash of each file and check it again each > time; for typical file sizes (<100 MB) sha1 is reasonably fast in > Terminal. We have to concentrate on making it bulletproof before > making it fast, anyway. > > Some other things to keep in mind: at present indexes can have > obsolete files because files are only removed if their owning BibItem > is deleted. IIRC it's possible for the same file to be added multiple > times because of this, since it gets a new URL after being autofiled. > We can't necessarily remove files when they're renamed because we > don't know if some other pub still has a link to that file. Mappings > like that (multiple items->file) aren't correctly supported right now, > and the last pub to add a file gets associated with it. I use file > content search frequently, but I've never bothered trying to fix this > because it doesn't affect my work. Pruning at startup would probably > be sufficient to take care of the renaming issue. > > -- > adam
I've implemented some dumb index caching for testing. Set BDSKShouldCacheFileSearchIndexKey to activate. I haven't tried myself yet, so there's a good change it fails. Christiaan ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Bibdesk-develop mailing list Bibdesk-develop@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bibdesk-develop