Hello,
>
> I agree with you that the current implementation is not
> optimized for queries
> that check the existence of a property. Your proposed
> solution seems reasonable,
> I would implement it the same way. There's just one minor
> obstacle, how do we
> implement this change in a backward compatible way? an
> existing index without
> this additional field should still work.
Apart from a possible solution, the policy is that moving some tag to the
latest jackrabbit version should always be possible without having to re-index?
Is it not an option to have some kind of warning that re-indexing is needed
when mocing to version x ?
My experience though with other repositories (slide) and a custom lucene
indexing layer on top of it handling all searches, is that for efficient
querying, I quite frequently had to change some indexing settings, which
implied re-indexing the entire repository. IMO, when you need a performant
search implementation, you need to be able to tune the parts you index, and you
need to be able to query on these. I think a single property should be possible
to index in different customizable ways. Might this be an option for the
indexingConfiguration, to be able to index a single property in multiple ways?
For example: each article(node) has an author property. I have 10.000.000
nodes. Now, I want to see the number of documents for each author with his name
starting with an "S". The only way to query this efficiently AFAICS, is
querying for some indexed field that holds the starting letter of an author
(perhaps configuring in the indexing configuration that the author name should
also be indexed in a seperate property, for example with a configured analyzer
that used the EdgeNGramTokenizer from lucene to index the first letter only.
for example something like:
<global-index-rules>
<property name="author">
<copyField dest="author-starting-letter"
analyzer="mypackage.FirstLetterAnalyzer"/>
</property>
<property name="publishdate">
<copyField dest="publishdate-weeknumber"
analyzer="mypackage.DateWeeknumberAnalyzer"/>
</property>
</global-index-rules>
where for example the publishdate-weeknumber holds the week number of a date
(if you need fast searching for all published articles in week X, but the
weeknumber is not a propery of the document)
But this might complicate indexing configuration obviously quite a bit, and you
might need to query on "virtual" properties not defined in .cnd files, which
proabably is not possible (though, I do not yet know enough of that part...is
this possible with the org.apache.jackrabbit.core.virtual package?)
Bottom line, before thinking about best way to find improved version for
querying nodes for existing props, is it allowed that a new jackrabbit release
forces people to re-index? IMO, it is quite a limitation if this is never
allowed (AFAIK, a lucene index might also become corrupted or in clustered
environments get out of sync, that makes a re-index needed )
Regards Ard
>
> regards
> marcel
>