Marcel Reutegger wrote:
Ard Schrijvers wrote:
IMO, we should index more (derived) data about a documents properties (I'll return to this in a mail about IndexingConfiguration which I think we can add some features that might tackle this) if we want to be able to query fast.
For this specific problem, the solution would be very simple:

I suggest to add

/** * Name of the field that contains all available properties that present
for a certain node */ public static final String PROPERTIES_SET =
"_:PROPERTIES_SET".intern();

and when indexing a node, each property name of that node is added to its
index (few lines of code in NodeIndexer):

Then, when searching for all nodes that have a property, is one single
docs.seek(terms); and set the docFilter. This approach scales to millions of documents easily with times close to 0 ms. WDOT? Ofcourse, I can implement
this in the trunk.

I agree with you that the current implementation is not optimized for queries that check the existence of a property. Your proposed solution seems reasonable, I would implement it the same way. There's just one minor obstacle, how do we implement this change in a backward compatible way? an existing index without this additional field should still work.

We could use IndexReader.getFieldNames() at startup to check if such a field already exists which means we have an index in the new format and then use this information in MatchAllScorer to decide which implementation to use.

Cheers,
Christoph

Reply via email to