Re: improving the scalability in searching

Christoph Kiehl Tue, 21 Aug 2007 15:29:06 -0700

Ard Schrijvers wrote:

Christoph Kiehl wrote:4. Regarding sorting: We will still need our own sorting because we cache
the document order per subreader whereas lucenes sorting only caches per
reader which get invalidated after every write operation. But the initial
cache creation will be faster.


That is a good point! I think in the sorting cache not the field prefix of
the terms where used, were they? If so, instead of performance gain, we might
gain quite some memory efficiency (though I am guessing here a little :-) )

Unfortunately it doesn't even help regarding memory consumption because we onlycache the terms itself without the prefixes.

I think that beside all unit tests have to keep working, I might/should
include a performance unit test, to see if there are substantial gains.

Well, it would be great to have such a performance test but in my experience therepository you use to run your test against has to be at least of a certain sizeto give a notable difference. It's difficult to create such a repository in aportable way. It's too big to check into subversion and too big to create on thefly. It would be great to have some kind of reference repository. I thoughtabout taking maybe a wikipedia snapshot (which are available for download) andpump this data into the repository. This will result in quite a big repository ...

I am not sure if there is an xpath equivalent to "give me all different
values of a property"...probably not, right?


I'm afraid not.

I wouldn't mind if you just start working on it ;) I'm sure Marcel is happy
to answer your questions, as am I if I'm able to ;) You could open a second
issue for the 1:1 mapping. Then just use those two issues and attach
patches. I'll definitely review them and try to help.


Ok. I'll file a jira issue on thursday for this, because tomorrow I am
occupied all day.


Great!

Cheers,
Christoph

Re: improving the scalability in searching

Reply via email to