> Christoph Kiehl wrote: > I'm a bit indifferent about 1) because I think the change is > not fundamentally > enough to justify a new QueryHandler class. Do you have any > other plans with the > new QueryHandler implementation? If I were to implement a SQL > based QueryHandler > solution I would create a new QueryHandler implementation, > but not for a small > change like that.
Well, about other changes, I have some in mind, but I might be seeing the big picture wrong: I have been looking through the indexing code, and I just seem to be unable to understand why all properties are indexed within the same lucene field, '_:PROPERTIES'. AFAICS, it complicates queries. Are the reasons for this somewhere in the 'ChildAxisQuery', 'DerefQuery', 'ParentAxisQuery' or some other (I haven't looked at these classes yet, so do not know how they work)? But, for me it seems much more a natural lucene index fit to use a seperate lucene Field for *every* unique property name. So, indexing a propety modificationDate, does not result in a lucene Field: <_PROPERTIES> 1:modificationDate?ms27115hc but <1:modificationDate> ms27115hc This is IMO a much clearer way to index. I think it makes classes like SharedFieldSortComparator redundant, because we can use the standard lucene sort (it seems to me that this sort is more efficient than the current JR one. Although I did not investigate is, I know that the longer the field values you sort on in lucene, the higher the memory consumption. Certainly when sorting is done on large result sets, a string prefix like '1:modificationDate?' can differ *many* Mb's in memory. OTOH, perhaps the SharedFieldSortComparator takes care of this in JR, I am not sure) Furthermore, indexing properties in lucene with there own property name makes you more flexible in implementing new kinds of searches. For example, give me all different 'authors' and do a count of how many articles each author has, ie facetted browsing. Facetted browsing is with the current indexing strategy much harder. And, as a possible add on to the indexing configuration class, but I need to know what you people think about it (and if it is possible to be jsr 170/283 compliant), I have been thinking about enriching the index via the indexing configuration with 'virtual properties' (I am not sure by the way what this org.apache.jackrabbit.core.virtual does, haven't looked at it...perhaps it coincides with my ideas, but somebody else might know). Suppose I am having a property with a Calendar date. I want in the frontend to be able to search for articles in week X. I do not want to store week X as a property, because it is an implicit part of the date I already have. I would like to define in indexing configuration that myproperty also needs to be index as myproperty_weeknr for example (and specify an analyzer that does this for you), and that I can query on this one. Just like I would do with the first letter of each author, to efficiently query all authors starting with an "a". Could this be implemented according the jsr spec, or is this really not compatible? So, WDOT about indexing properties in seperate lucene Fields, and about possibly indexing more information of one property. My experience with lucene, is that indexing tactically, eases querying a lot, and gains you lots of performance. So, if you do agree on these changes, which I can try to build in Jackrabbit, then I think these changes might validate a new QueryHandler class to be build aside the old one. WDOT? Regards Ard >
