I would like to second this.
I know from previous discussions that it is a design decision of Jackrabbit
to not exlcusively work with RDBMS - if it was, I would be all in favour of
leaning on it to do the hardwork.
But I presume Lucene is leaned on to do all the hard work instead (and it is
certainly capable) - but for me query performance seems to be a bit of
voodoo and random without diving into jackrabbit. I definately think a lot
of work can be done in that regards.
On 3/1/07, David Johnson <[EMAIL PROTECTED]> wrote:
We are exploring using Jackrabbit in a production environment. I have a
repository that we have created from our content that has > 100K nodes.
Several of our use case need to use date range queries and also use 'order
by' frequently. We have noticed that the query time is significantly
slower
than necessary. After warming up the repository ( i.e., running the suite
of queries once), as an example:
"select * from Column where jcr:path like 'Gossip/ColumnName/Columns/%'
and
status <> 'hidden' order by publishDate desc" takes 500 ms to execute -
this
is just the execution time, I am not actually using or accessing the
NodeIterator.
Whereas: "select * from Column where jcr:path like
'Gossip/ColumnName/Columns/%' and status <> 'hidden'" takes only 33 ms to
execute.
/jcr:root/Gossip/ColumnName/Columns//element(*,Column)[EMAIL PROTECTED] >
xs:dateTime("way in the past") and @publishDate < xs:dateTime("way in the
future") and (@status != 'hidden')] order by @publishDate descending takes
1096 ms to execute.
Clearly dates (ordering and ranges) have a significant impact on query
execution speed.
Digging into the internals of Jackrabbit, we have noticed that there is an
implementation of RangeQuery that essentially walks the results if the #
of
query terms is greater than what Lucene can handle. Reading the Lucene
documentation, it looks like Filters are the recommended method of
implementing "large" range queries, and also seem like a natural for
matching node types - i.e., select * from Column
Is there any ongoing work on query optimization and performance. We would
be very interested in such work, including offering any help that we can.
-Dave