On Jan 16, 2008, at 4:16 AM, Christiaan Hofman wrote: > > On 15 Jan 2008, at 8:03 AM, Adam R. Maxwell wrote: > >> >> On Jan 14, 2008, at 2:14 PM, Christiaan Hofman wrote: >> >>> (continued from user list) >>> >>> I think we could do the following when loading a cached index: >>> >>> - Iterate the pubs in the document to get all linked file URLs >>> - Iterate the SKDocuments in the index to get all the indexed URLs >>> (if there was a cached index) >>> - Compare the two sets. Remove and add SKDocuments for URLs as >>> needed >>> >>> Apart from the first part this can all be done on a secondary >>> thread. >>> >>> Note that what we index is basically the text of the URLs. The >>> relation to the pubs can be recreated each time separately, that >>> does >>> not need to be persistent. When we search we also just follow that >>> relation as the last (and easy) step. >> >> That sounds like it would work. >> >>> I'm not even sure if we need the time stamp for the .bib file when >>> we >>> do it this way, as we update the URLs anyway. I think it would even >>> work with a partially finished index that was cached. >> >> I'd trash it and start over again unless it's absolutely certain to >> be >> in a consistent state (i.e. properly flushed and closed when the >> document closed or app quit). Search Kit is very unforgiving. >> >>> The only fragility AFAICS is when user replace the file at a linked >>> file URL. That is hard to fix. The only way I can think of is to >>> cache time stamps for every linked file with the index. Though that >>> may be slow. >> >> Checking a time stamp would be trivial compared to indexing, but I >> don't think a time stamp is sufficient. The best we can do is >> probably store the sha1 hash of each file and check it again each >> time; for typical file sizes (<100 MB) sha1 is reasonably fast in >> Terminal. We have to concentrate on making it bulletproof before >> making it fast, anyway. >> >> Some other things to keep in mind: at present indexes can have >> obsolete files because files are only removed if their owning BibItem >> is deleted. IIRC it's possible for the same file to be added >> multiple >> times because of this, since it gets a new URL after being autofiled. >> We can't necessarily remove files when they're renamed because we >> don't know if some other pub still has a link to that file. Mappings >> like that (multiple items->file) aren't correctly supported right >> now, >> and the last pub to add a file gets associated with it. I use file >> content search frequently, but I've never bothered trying to fix this >> because it doesn't affect my work. Pruning at startup would probably >> be sufficient to take care of the renaming issue. >> >> -- >> adam > > I've implemented some dumb index caching for testing. Set > BDSKShouldCacheFileSearchIndexKey to activate. I haven't tried myself > yet, so there's a good change it fails.
ok, I haven't really looked at any of it yet; hopefully later today. ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Bibdesk-develop mailing list Bibdesk-develop@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bibdesk-develop