Re: [Bibdesk-develop] cache index?

Christiaan Hofman Wed, 16 Jan 2008 04:16:57 -0800

On 15 Jan 2008, at 8:03 AM, Adam R. Maxwell wrote:

>
> On Jan 14, 2008, at 2:14 PM, Christiaan Hofman wrote:
>
>> (continued from user list)
>>
>> I think we could do the following when loading a cached index:
>>
>> - Iterate the pubs in the document to get all linked file URLs
>> - Iterate the SKDocuments in the index to get all the indexed URLs
>> (if there was a cached index)
>> - Compare the two sets. Remove and add SKDocuments for URLs as needed
>>
>> Apart from the first part this can all be done on a secondary thread.
>>
>> Note that what we index is basically the text of the URLs. The
>> relation to the pubs can be recreated each time separately, that does
>> not need to be persistent. When we search we also just follow that
>> relation as the last (and easy) step.
>
> That sounds like it would work.
>
>> I'm not even sure if we need the time stamp for the .bib file when we
>> do it this way, as we update the URLs anyway. I think it would even
>> work with a partially finished index that was cached.
>
> I'd trash it and start over again unless it's absolutely certain to be
> in a consistent state (i.e. properly flushed and closed when the
> document closed or app quit).  Search Kit is very unforgiving.
>
>> The only fragility AFAICS is when user replace the file at a linked
>> file URL. That is hard to fix. The only way I can think of is to
>> cache time stamps for every linked file with the index. Though that
>> may be slow.
>
> Checking a time stamp would be trivial compared to indexing, but I
> don't think a time stamp is sufficient.  The best we can do is
> probably store the sha1 hash of each file and check it again each
> time; for typical file sizes (<100 MB) sha1 is reasonably fast in
> Terminal.  We have to concentrate on making it bulletproof before
> making it fast, anyway.
>
> Some other things to keep in mind: at present indexes can have
> obsolete files because files are only removed if their owning BibItem
> is deleted.  IIRC it's possible for the same file to be added multiple
> times because of this, since it gets a new URL after being autofiled.
> We can't necessarily remove files when they're renamed because we
> don't know if some other pub still has a link to that file.  Mappings
> like that (multiple items->file) aren't correctly supported right now,
> and the last pub to add a file gets associated with it.  I use file
> content search frequently, but I've never bothered trying to fix this
> because it doesn't affect my work.  Pruning at startup would probably
> be sufficient to take care of the renaming issue.
>
> --  
> adam


I've implemented some dumb index caching for testing. Set  
BDSKShouldCacheFileSearchIndexKey to activate. I haven't tried myself  
yet, so there's a good change it fails.

Christiaan


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bibdesk-develop mailing list
Bibdesk-develop@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bibdesk-develop

Re: [Bibdesk-develop] cache index?

Reply via email to