RE: Re: improving the scalability in searching

Ard Schrijvers Tue, 21 Aug 2007 14:18:36 -0700

> Christoph Kiehl wrote:
> In general I think it's a good idea to have a 1:1 mapping of 
> properties to 
> lucene fields. It's just more natural and easier to 
> understand as you said.
> 
> Performance wise I'm not sure if it will gain you "lots of 
> performance". I just 
> had a quick look at the code and found the following places 
> where I think the 
> performance will improve:
> 
> 1. DerefQuery can directly query for matching documents 
> instead of iterating 
> over all context hits.
> 2. MatchAllScorer would perform better. But you made an even 
> better suggestion 
> how to handle those in the future.
> 3. WildcardQuery will probably improve a bit because you have 
> less terms.
> 4. Regarding sorting: We will still need our own sorting 
> because we cache the 
> document order per subreader whereas lucenes sorting only 
> caches per reader 
> which get invalidated after every write operation. But the 
> initial cache 
> creation will be faster.


That is a good point! I think in the sorting cache not the field prefix of the 
terms where used, were they? If so, instead of performance gain, we might gain 
quite some memory efficiency (though I am guessing here a little :-) )

> 
> Overall I wouldn't expect a _much_ better performance. Or 
> could you explain what 
> other performance improvements you expect?

I think most improvements (performance and memory consumption) are small. The 
big fish was indeed the MatchAllScorer replacement by PROPERTIES_SET. ATM, I 
can not foresee wether there are other parts that might become easier/faster. I 
think that beside all unit tests have to keep working, I might/should include a 
performance unit test, to see if there are substantial gains. My other plan 
about 'virtual node indexing' (not real nodes, only for searching) could add 
substantial faster searches, but for now, this would imply non jsr custom JR 
code, which Marcel already commented on to dislike :-(

An example of something that would gain performance with the 1:1 mapping, is 
one of the parts that I am implementing in a custom class is that I want to 
query for all different terms in field X and count them (facetted views) [code 
will be open source so if people are interested, in due time I can give 
pointers to the code(or better, if there is room in the JR trunk for it)]. I do 
not think this would be possible to implement in a performant way without the 
1:1 mapping. 

I am not sure if there is an xpath equivalent to "give me all different values 
of a property"...probably not, right?

> 
> But I would definitely like to see the 1:1 mapping, because 
> some parts of the 
> code become better/easier to understand and even those small 
> performance 
> improvements are a gain.

Yes, I think so too. It took me hours to understand when I first opened the 
current jackrabbit indices with luke :-)

> I wouldn't mind if you just start working on it ;) I'm sure 
> Marcel is happy to 
> answer your questions, as am I if I'm able to ;)
> You could open a second issue for the 1:1 mapping. Then just 
> use those two 
> issues and attach patches. I'll definitely review them and 
> try to help.

Ok. I'll file a jira issue on thursday for this, because tomorrow I am occupied 
all day.

> 
> Thanks a lot for your efforts!

You're welcome

Regards Ard

> 
> Cheers,
> Christoph
> 
>

RE: Re: improving the scalability in searching

Reply via email to