Hi folks, We have a few open PRs adding new data to the DocValuesSkipper interface (eg https://github.com/apache/lucene/pull/15993, https://github.com/apache/lucene/pull/15737), and other open issues discussing adding more (https://github.com/apache/lucene/issues/15884). We also have some ideas here at elastic for other bits of information that would be useful in highly specific circumstances but not really in the general case. These all run into issues with backwards compatibility, and questions of how to reliably signal to clients what data is available for a given field and segment.
One idea I had that would make this a bit more pluggable, and allow Codecs to add additional block-based data without having to alter the base API too much, is to add a SkipType object which would be passed to the LeafReader like so: T getDocValuesSkipper(SkipType<T extends DocValuesSkipper> type) The codec would check the class of the SkipType and see if it knows how to return that information. If yes, it returns an instance of T, if not it returns null. The default type would be a Range<DocValuesSkipper>, which would return the basic DocValuesSkipper that we have now, but we can extend things with a Count or Cardinality type. On the indexing side, the FieldInfo could record the SkipType so that the codec knows what metadata to generate. Some of these bits of information are useful both as global metadata and as part of a skip block; some are only really relevant at the global level. Tying into the work that Ignacio is doing in https://github.com/apache/lucene/issues/16052, the global metadata tends to be loaded at segment open time and so can be accessed cheaply without doing any IO, but because it is part of the general DocValuesSkipper object it can only be accessed by calling LeafReader.getDocValuesSkipper() which loads a bunch of extra data (and declares that it does IO via its throws clause). We could add an intermediate object here, analogous to Points or Terms, called DocValues (or something similar, I know this is already a class with static helper methods on it); this would make the global min, max and docCount (and maybe cardinality) available without having to do any further IO, and the getSkipper() method could optionally be moved onto the intermediate object. What do people think? - Alan --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
