Re: Query Performance and Optimization

David Johnson Mon, 12 Mar 2007 14:54:10 -0800

As another example, for each node, perhaps every potential parent path could
be added to the index - as an example a node at /a/b/c/d/e/f/g would have
index entries:


path1: /a
path2: /a/b
path3: /a/b/c
path4: /a/b/c/d
path5: /a/b/c/d/e
path6: /a/b/c/d/e/f

so queries for specific sub-paths - e.g., select * from my:type where
jcr:path like '/a/b/c/%'  could be mapped to a direct lucene match query i.e.,
path3 = /a/b/c

The index entry to use for the Lucene query could be determined easily by
simple parsing of the path specified in the query.

Perhaps something like this is already in the code.  Is ChildAxisQuery and
DescendantSelfAxisQuery currently used for cases like this?

-Dave

On 3/12/07, Marcel Reutegger <[EMAIL PROTECTED]> wrote:


David Johnson wrote:
> I think I was again focusing on range queries and giving Lucene some way
of
> filtering out subsets of the document set, so that the whole document
set
> wouldn't have to be walked.  For the date range query the from and to
dates
> would most likely share some set of most significant bytes - these bytes
> could just be passed to Lucene as a direct match thereby reducing the
> subset
> of the collection that would by walked.  If the range query is fixed
this
> "optimization" would be unnecessary.  Nevertheless, I still wonder if
there
> is additional information that could be stored in Lucene to augment the
> index and improve query processing.

ah, now I see. yes, that might help in some cases. e.g. you could say get
me all
documents with a year value of 2007 and month value of 7. which would be
equivalent to a range query 2007-07-01 to 2007-07-31

> In this case I was considering using the node UUID as the cross-index
join
> parameter.  Still, there is the problem of combining the results from
two
> different indexes.

there are two issues with this approach:
1) getting the UUID requires lucene to load the document
2) implementing an *efficient* join across system boundaries is not easy,
even
if the documents are sorted.

>> 3) Use the database to provide the indexing structures.
>>
>> To me this seems to be a very interesting option, though it requires
>> considerable effort.
>
> Yes, I agree, this is an interesting option, and does seem that it would
> take a fair amount of effort.  Your comments on the user list to this
same
> thread seems like a start to the thought process needed.  I am not very
> familiar with the details of the PM, although I do think that bringing
> together data storage and indexing will help with improving query
> processing
> speed, as well as help with some data integrity issues that have been
> discussed in other threads.
>
> Over the weekend, I will see if I can come up with a solution to the
range
> query issue discussed above.

great.

regards
  marcel

Re: Query Performance and Optimization

Reply via email to