My 2 cents worth on this. If you really cannot do more 'targeted' queries
that return a reasonable number of results, and you absolutely MUST scan
them one by one, for whatever reason, one alternative is just do this work
in some deamon thread and save the resulting ID values. In other words
cache the results. Then when whatever system is trying to scan thru all
these results that system will basically have instant access to all the
cached node ids like an array. In other words, have a demaon that just
creates a massive array of NODE ids every 30in or so (I'm assuming you
dont' need realtime-correct data, or else you'd need more of an ACID DB
than a JCR). You could store these IDs in a RandomAccess file, or
fixed-length file, or even store them as actual Nodes with an ordinal
property that lets you query for them by a range of ordinals (ordinal >= x
and ordinal <= y). Usually in software dev when there is an unsolvable
performance requirement like this a 'precache all the work' type of
solution is workable. Hope it helps, i'm just pontificating, from 30yrs
experience.


Best regards,
Clay Ferguson
[email protected]


On Fri, Nov 11, 2016 at 7:20 AM, Nils Breunese (JIRA) <[email protected]>
wrote:

>
>     [ https://issues.apache.org/jira/browse/JCR-4057?page=com.
> atlassian.jira.plugin.system.issuetabpanels:comment-
> tabpanel&focusedCommentId=15657061#comment-15657061 ]
>
> Nils Breunese commented on JCR-4057:
> ------------------------------------
>
> The code could maybe still be changed to not create an anonymous
> {{ArrayList<ScoreNode[]>}} of {{offset}} nodes which is not used anywhere,
> which would cause the code to use less memory, but I guess that wouldn't be
> enough to fix our current performance issue.
>
> > Improve performance of skipping offset nodes for Lucene queries
> > ---------------------------------------------------------------
> >
> >                 Key: JCR-4057
> >                 URL: https://issues.apache.org/jira/browse/JCR-4057
> >             Project: Jackrabbit Content Repository
> >          Issue Type: Improvement
> >          Components: core
> >    Affects Versions: 2.10.4
> >            Reporter: Nils Breunese
> >              Labels: performance
> >         Attachments: JCR-4057-test.patch, JCR-4057.patch
> >
> >
> > When doing Lucene-based queries with large offset values like 12000 I
> see pretty bad performance in our system. I have already enabled the
> {{sizeEstimate}} option to improve performance, but still see queries
> taking 6 to 66 seconds.
> > I have identified the call to {{collectScoreNodes}} for offset nodes in
> {{org.apache.jackrabbit.core.query.lucene.QueryResultImpl#getResults}} to
> be the cause of this. The {{collectScoreNodes}} method builds an anonymous
> {{ArrayList<ScoreNode[]>}} for the offset nodes, which is not used after
> building it, so it uses memory for nothing, and it also does access checks
> for these nodes which are not returned.
> > I have attached a patch to Jackrabbit 2.10.4 which just calls {{skip}}
> on the {{MultiColumnQueryHits result}} and using this patch our query times
> seem to stay under 2 seconds.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>

Reply via email to