[jira] Commented: (JCR-974) Manage Lucene FieldCaches per index segment

Marcel Reutegger (JIRA) Fri, 22 Jun 2007 00:45:46 -0700

    [ 
https://issues.apache.org/jira/browse/JCR-974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507153
 ]


Marcel Reutegger commented on JCR-974:
--------------------------------------

Thanks a lot for the revised patch. I was just going to write some comments 
about your previous version and now most of my concerns are already addressed ;)

One of the concerns I had was regarding lazy loading, because it would have 
required synchronization on the map (which was missing in the previous patch).

I'm using a fairly simple test case to measure performance. It involves 
creating 200'000 nodes with a LONG property set to distinct values and then 
executing the following query:

stuff//[EMAIL PROTECTED] order by @foo

On average the execution time with the current codebase is 2200ms, with your 
initial patch: 265ms and with the latest patch: 235ms.

Can you please make sure you consistently use space characters instead of tabs 
in the source code? Thanks.

Now, there's just one thing left. You introduced a cache for the 
ReadOnlyIndexReaders in AbstractIndex. I'd rather not want to have a cache 
there because it means that we have to maintain it. In your patch the map is 
cleaned when the index is invalidated. For older index segments (the bigger 
ones, which resulted from index merges) this means that the map is only cleaned 
when the index segment is closed (when it is part of a merge or on shutdown). 
IMO this is somewhat of a memory leak and should be changed.

I would rather have these three lines at the beginning of the method 
SharedFieldCache.getStringIndex():

        if (reader instanceof ReadOnlyIndexReader) {
            reader = ((ReadOnlyIndexReader) reader).getBase();
        }

It doesn't win a price for beauty but has the same effect as the cache.

WDYT?

> Manage Lucene FieldCaches per index segment
> -------------------------------------------
>
>                 Key: JCR-974
>                 URL: https://issues.apache.org/jira/browse/JCR-974
>             Project: Jackrabbit
>          Issue Type: Improvement
>          Components: query
>    Affects Versions: 1.3
>            Reporter: Christoph Kiehl
>         Attachments: ItemStateManagerBasedSortComparator.patch, patch.txt, 
> patch2.txt
>
>
> Jackrabbit uses an IndexSearcher which searches on a single IndexReader which 
> is most likely to be an instance of CachingMultiReader. On every search that 
> does sorting or range queries a FieldCache is populated and associated with 
> this instance of a CachingMultiReader. On successive queries which operate on 
> this CachingMultiReader you will get a tremendous speedup for queries which 
> can reuse  those associated FieldCache instances.
> The problem is that Jackrabbit creates a new CachingMultiReader _everytime_ 
> one of the underlying indexes are modified. This means if you just change 
> _one_ item in the repository you will need to rebuild all those FieldCaches 
> because the existing FieldCaches are associated with the old instance of 
> CachingMultiReader.
> This does not only lead to slow search response times for queries which 
> contains range queries or are sorted by a field but also leads to massive 
> memory consumption (depending on the size of your indexes) because there 
> might be multiple instances of CachingMultiReaders in use if you have a 
> scenario where a lot of queries and item modifications are executed 
> concurrently.
> The goal is to keep those FieldCaches as long as possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (JCR-974) Manage Lucene FieldCaches per index segment

Reply via email to