Re: [Bibdesk-develop] cache index?

Adam R. Maxwell Wed, 16 Jan 2008 07:44:50 -0800

On Jan 16, 2008, at 4:16 AM, Christiaan Hofman wrote:

>
> On 15 Jan 2008, at 8:03 AM, Adam R. Maxwell wrote:
>
>>
>> On Jan 14, 2008, at 2:14 PM, Christiaan Hofman wrote:
>>
>>> (continued from user list)
>>>
>>> I think we could do the following when loading a cached index:
>>>
>>> - Iterate the pubs in the document to get all linked file URLs
>>> - Iterate the SKDocuments in the index to get all the indexed URLs
>>> (if there was a cached index)
>>> - Compare the two sets. Remove and add SKDocuments for URLs as  
>>> needed
>>>
>>> Apart from the first part this can all be done on a secondary  
>>> thread.
>>>
>>> Note that what we index is basically the text of the URLs. The
>>> relation to the pubs can be recreated each time separately, that  
>>> does
>>> not need to be persistent. When we search we also just follow that
>>> relation as the last (and easy) step.
>>
>> That sounds like it would work.
>>
>>> I'm not even sure if we need the time stamp for the .bib file when  
>>> we
>>> do it this way, as we update the URLs anyway. I think it would even
>>> work with a partially finished index that was cached.
>>
>> I'd trash it and start over again unless it's absolutely certain to  
>> be
>> in a consistent state (i.e. properly flushed and closed when the
>> document closed or app quit).  Search Kit is very unforgiving.
>>
>>> The only fragility AFAICS is when user replace the file at a linked
>>> file URL. That is hard to fix. The only way I can think of is to
>>> cache time stamps for every linked file with the index. Though that
>>> may be slow.
>>
>> Checking a time stamp would be trivial compared to indexing, but I
>> don't think a time stamp is sufficient.  The best we can do is
>> probably store the sha1 hash of each file and check it again each
>> time; for typical file sizes (<100 MB) sha1 is reasonably fast in
>> Terminal.  We have to concentrate on making it bulletproof before
>> making it fast, anyway.
>>
>> Some other things to keep in mind: at present indexes can have
>> obsolete files because files are only removed if their owning BibItem
>> is deleted.  IIRC it's possible for the same file to be added  
>> multiple
>> times because of this, since it gets a new URL after being autofiled.
>> We can't necessarily remove files when they're renamed because we
>> don't know if some other pub still has a link to that file.  Mappings
>> like that (multiple items->file) aren't correctly supported right  
>> now,
>> and the last pub to add a file gets associated with it.  I use file
>> content search frequently, but I've never bothered trying to fix this
>> because it doesn't affect my work.  Pruning at startup would probably
>> be sufficient to take care of the renaming issue.
>>
>> --  
>> adam
>
> I've implemented some dumb index caching for testing. Set
> BDSKShouldCacheFileSearchIndexKey to activate. I haven't tried myself
> yet, so there's a good change it fails.


ok, I haven't really looked at any of it yet; hopefully later today.


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bibdesk-develop mailing list
Bibdesk-develop@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bibdesk-develop

Re: [Bibdesk-develop] cache index?

Reply via email to