Re: improving the scalability in searching

Christoph Kiehl Wed, 15 Aug 2007 00:29:52 -0700

Marcel Reutegger wrote:

Ard Schrijvers wrote:
IMO, we should index more (derived) data about a documents properties(I'llreturn to this in a mail about IndexingConfiguration which I think wecan addsome features that might tackle this) if we want to be able to queryfast.
For this specific problem, the solution would be very simple:
I suggest to add
/** * Name of the field that contains all available properties thatpresent
for a certain node */ public static final String PROPERTIES_SET =
"_:PROPERTIES_SET".intern();

and when indexing a node, each property name of that node is added to its
index (few lines of code in NodeIndexer):

Then, when searching for all nodes that have a property, is one single
docs.seek(terms); and set the docFilter. This approach scales tomillions ofdocuments easily with times close to 0 ms. WDOT? Ofcourse, I canimplement
this in the trunk.
I agree with you that the current implementation is not optimized forqueries that check the existence of a property. Your proposed solutionseems reasonable, I would implement it the same way. There's just oneminor obstacle, how do we implement this change in a backward compatibleway? an existing index without this additional field should still work.

We could use IndexReader.getFieldNames() at startup to check if such afield already exists which means we have an index in the new format andthen use this information in MatchAllScorer to decide whichimplementation to use.


Cheers,
Christoph

Re: improving the scalability in searching

Reply via email to