[ 
https://issues.apache.org/jira/browse/JCR-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12544401
 ] 

Marcel Reutegger commented on JCR-1213:
---------------------------------------

I recently added some documentation to the website about the index readers:

http://jackrabbit.apache.org/doc/arch/operate/index-readers.html

> to be honest, I cannot yet grasp the big picture about keeping track of the 
> deleted bitset

The new documentation shows how and when the deleted bit set for the 
ReadOnlyIndexReader is created.

The ReadOnlyIndexReaders are indeed constructed on every change. That's very 
unfortunate and should be changed. I'll create an issue for that. While this 
will fix the case where an ReadOnlyIndexReader is re-constructed even though 
nothing changed in that segment, we will still have the issue that a new 
ReadOnlyIndexReader is constructed if a node is deleted in that segment. Even 
in that case we don't want to re-calculate all the UUIDDocIds that point to 
this segment.

> So, instead of using a WeakReference on the multiReader segments, I could get 
> the sharedReader instance out of it

Yes, that's probably the only way how to keep the UUIDDocIds valid as long as 
possible. A chose a similar approach in CachingMultiReader.termDocs(Term). The 
relation between the shared reader and the read only reader is held in 
readersByBase. But that's quite ugly.

Thinking more about this issue it might be worth looking at an alternative. 
There is a DocNumberCache, which maps a UUID to a CachingIndexReader with a 
document number. This is exactly the information that is also present in a 
UUIDDocId. So we might just as well not cache the result in UUIDDocId but 
always use the DocNumberCache to resolve it. However I'm not sure how much 
overhead that adds. I'll have to investigate that first...

> UUIDDocId cache does not work properly because of weakReferences in 
> combination with new instance for combined indexreader 
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: JCR-1213
>                 URL: https://issues.apache.org/jira/browse/JCR-1213
>             Project: Jackrabbit
>          Issue Type: Improvement
>          Components: query
>    Affects Versions: 1.3.3
>            Reporter: Ard Schrijvers
>             Fix For: 1.4
>
>
> Queries that use ChildAxisQuery or DescendantSelfAxisQuery make use of 
> getParent() functions to know wether the parents are correct and if the 
> result is allowed. The getParent() is called recursively for every hit, and 
> can become very expensive. Hence, in DocId.UUIDDocId, the parents are cached. 
> Currently,  docId.UUIDDocId's are cached by having a WeakRefence to the 
> CombinedIndexReader, but, this CombinedIndexReader is recreated all the time, 
> implying that a gc() is allowed to remove the 'expensive' cache.
> A much better solution is to not have a weakReference to the 
> CombinedIndexReader, but to a reference of each indexreader segment. This 
> means, that in getParent(int n) in SearchIndex the return 
> return id.getDocumentNumber(this) needs to be replaced by return 
> id.getDocumentNumber(subReaders[i]); and something similar in 
> CachingMultiReader. 
> That is all. Obviously, when a node/property is added/removed/changed, some 
> parts of the cached DocId.UUIDDocId will be invalid, but mainly small indexes 
> are updated frequently, which obviously are less expensive to recompute.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to