Marcel Reutegger wrote:
Ard Schrijvers wrote:
IMO, we should index more (derived) data about a documents properties
(I'll
return to this in a mail about IndexingConfiguration which I think we
can add
some features that might tackle this) if we want to be able to query
fast.
For this specific problem, the solution would be very simple:
I suggest to add
/** * Name of the field that contains all available properties that
present
for a certain node */ public static final String PROPERTIES_SET =
"_:PROPERTIES_SET".intern();
and when indexing a node, each property name of that node is added to its
index (few lines of code in NodeIndexer):
Then, when searching for all nodes that have a property, is one single
docs.seek(terms); and set the docFilter. This approach scales to
millions of
documents easily with times close to 0 ms. WDOT? Ofcourse, I can
implement
this in the trunk.
I agree with you that the current implementation is not optimized for
queries that check the existence of a property. Your proposed solution
seems reasonable, I would implement it the same way. There's just one
minor obstacle, how do we implement this change in a backward compatible
way? an existing index without this additional field should still work.
We could use IndexReader.getFieldNames() at startup to check if such a
field already exists which means we have an index in the new format and
then use this information in MatchAllScorer to decide which
implementation to use.
Cheers,
Christoph