Hi folks,

We have a few open PRs adding new data to the DocValuesSkipper interface (eg 
https://github.com/apache/lucene/pull/15993, 
https://github.com/apache/lucene/pull/15737), and other open issues discussing 
adding more (https://github.com/apache/lucene/issues/15884). We also have some 
ideas here at elastic for other bits of information that would be useful in 
highly specific circumstances but not really in the general case.  These all 
run into issues with backwards compatibility, and questions of how to reliably 
signal to clients what data is available for a given field and segment.

One idea I had that would make this a bit more pluggable, and allow Codecs to 
add additional block-based data without having to alter the base API too much, 
is to add a SkipType object which would be passed to the LeafReader like so:

T getDocValuesSkipper(SkipType<T extends DocValuesSkipper> type)

The codec would check the class of the SkipType and see if it knows how to 
return that information.  If yes, it returns an instance of T, if not it 
returns null.  The default type would be a Range<DocValuesSkipper>, which would 
return the basic DocValuesSkipper that we have now, but we can extend things 
with a Count or Cardinality type.  On the indexing side, the FieldInfo could 
record the SkipType so that the codec knows what metadata to generate.

Some of these bits of information are useful both as global metadata and as 
part of a skip block; some are only really relevant at the global level.  Tying 
into the work that Ignacio is doing in 
https://github.com/apache/lucene/issues/16052, the global metadata tends to be 
loaded at segment open time and so can be accessed cheaply without doing any 
IO, but because it is part of the general DocValuesSkipper object it can only 
be accessed by calling LeafReader.getDocValuesSkipper() which loads a bunch of 
extra data (and declares that it does IO via its throws clause).

We could add an intermediate object here, analogous to Points or Terms, called 
DocValues (or something similar, I know this is already a class with static 
helper methods on it); this would make the global min, max and docCount (and 
maybe cardinality) available without having to do any further IO, and the 
getSkipper() method could optionally be moved onto the intermediate object.

What do people think?

- Alan
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to