[
https://issues.apache.org/jira/browse/JCR-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12544401
]
Marcel Reutegger commented on JCR-1213:
---------------------------------------
I recently added some documentation to the website about the index readers:
http://jackrabbit.apache.org/doc/arch/operate/index-readers.html
> to be honest, I cannot yet grasp the big picture about keeping track of the
> deleted bitset
The new documentation shows how and when the deleted bit set for the
ReadOnlyIndexReader is created.
The ReadOnlyIndexReaders are indeed constructed on every change. That's very
unfortunate and should be changed. I'll create an issue for that. While this
will fix the case where an ReadOnlyIndexReader is re-constructed even though
nothing changed in that segment, we will still have the issue that a new
ReadOnlyIndexReader is constructed if a node is deleted in that segment. Even
in that case we don't want to re-calculate all the UUIDDocIds that point to
this segment.
> So, instead of using a WeakReference on the multiReader segments, I could get
> the sharedReader instance out of it
Yes, that's probably the only way how to keep the UUIDDocIds valid as long as
possible. A chose a similar approach in CachingMultiReader.termDocs(Term). The
relation between the shared reader and the read only reader is held in
readersByBase. But that's quite ugly.
Thinking more about this issue it might be worth looking at an alternative.
There is a DocNumberCache, which maps a UUID to a CachingIndexReader with a
document number. This is exactly the information that is also present in a
UUIDDocId. So we might just as well not cache the result in UUIDDocId but
always use the DocNumberCache to resolve it. However I'm not sure how much
overhead that adds. I'll have to investigate that first...
> UUIDDocId cache does not work properly because of weakReferences in
> combination with new instance for combined indexreader
> ---------------------------------------------------------------------------------------------------------------------------
>
> Key: JCR-1213
> URL: https://issues.apache.org/jira/browse/JCR-1213
> Project: Jackrabbit
> Issue Type: Improvement
> Components: query
> Affects Versions: 1.3.3
> Reporter: Ard Schrijvers
> Fix For: 1.4
>
>
> Queries that use ChildAxisQuery or DescendantSelfAxisQuery make use of
> getParent() functions to know wether the parents are correct and if the
> result is allowed. The getParent() is called recursively for every hit, and
> can become very expensive. Hence, in DocId.UUIDDocId, the parents are cached.
> Currently, docId.UUIDDocId's are cached by having a WeakRefence to the
> CombinedIndexReader, but, this CombinedIndexReader is recreated all the time,
> implying that a gc() is allowed to remove the 'expensive' cache.
> A much better solution is to not have a weakReference to the
> CombinedIndexReader, but to a reference of each indexreader segment. This
> means, that in getParent(int n) in SearchIndex the return
> return id.getDocumentNumber(this) needs to be replaced by return
> id.getDocumentNumber(subReaders[i]); and something similar in
> CachingMultiReader.
> That is all. Obviously, when a node/property is added/removed/changed, some
> parts of the cached DocId.UUIDDocId will be invalid, but mainly small indexes
> are updated frequently, which obviously are less expensive to recompute.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.