[ 
https://issues.apache.org/jira/browse/JCR-4057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15656962#comment-15656962
 ] 

Nils Breunese commented on JCR-4057:
------------------------------------

We just have pretty large trees of page nodes and on the top level these 
overview pages go to over 600 pages. The high numbers might not be visited by 
actual site users very often, but just search engine bots seem enough to have a 
noticeable impact on our performance. I agree that it might make sense for us 
to revisit the way these pages work to avoid these large offset values, but for 
now it's what we have to work with.

Do you have some more pointers for me about alternative approaches, hooking 
into some code of Jackrabbit? I'm not sure what you're talking about there.

For now I guess we'll be using a patched {{QueryResultImpl}}, but that's not 
ideal of course and maintaining a fork of Jackrabbit is also not what I'm 
looking for.

> Improve performance of skipping offset nodes for Lucene queries
> ---------------------------------------------------------------
>
>                 Key: JCR-4057
>                 URL: https://issues.apache.org/jira/browse/JCR-4057
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 2.10.4
>            Reporter: Nils Breunese
>              Labels: performance
>         Attachments: JCR-4057-test.patch, JCR-4057.patch
>
>
> When doing Lucene-based queries with large offset values like 12000 I see 
> pretty bad performance in our system. I have already enabled the 
> {{sizeEstimate}} option to improve performance, but still see queries taking 
> 6 to 66 seconds.
> I have identified the call to {{collectScoreNodes}} for offset nodes in 
> {{org.apache.jackrabbit.core.query.lucene.QueryResultImpl#getResults}} to be 
> the cause of this. The {{collectScoreNodes}} method builds an anonymous 
> {{ArrayList<ScoreNode[]>}} for the offset nodes, which is not used after 
> building it, so it uses memory for nothing, and it also does access checks 
> for these nodes which are not returned.
> I have attached a patch to Jackrabbit 2.10.4 which just calls {{skip}} on the 
> {{MultiColumnQueryHits result}} and using this patch our query times seem to 
> stay under 2 seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to