juan camilo rodriguez duran created LUCENE-9236:
---------------------------------------------------

             Summary: Having a modular Doc Values format
                 Key: LUCENE-9236
                 URL: https://issues.apache.org/jira/browse/LUCENE-9236
             Project: Lucene - Core
          Issue Type: Improvement
          Components: core/index
            Reporter: juan camilo rodriguez duran


 Today DocValues Consumer/Producer require override 5 different methods, even 
if you only want to use one and given that one given field can only support one 
doc values type at same time.

 

In the attached PR I’ve implemented a new modular version of those classes 
(consumer/producer) each one having a single responsibility and writing in the 
same unique file.

This is mainly a refactor of the existing format opening the possibility to 
override or implement the sub-format you need.

 

I’ll do in 3 steps:
 # Create a CompositeDocValuesFormat and moving the code of 
Lucene80DocValuesFormat in separate classes, without modifying the inner code. 
At same time I created a Lucene85CompositeDocValuesFormat based on these 
changes.
 # I’ll introduce some basic components for writing doc values in general such 
as:
 ## DocumentIdSetIterator Serializer: used in each type of field based on an 
IndexedDISI.
 ## Document Ordinals Serializer: Used in Sorted and SortedSet for deduplicate 
values using a dictionary.
 ## Document Boundaries Serializer (optional used only for multivalued fields: 
SortedNumeric and SortedSet)
 ## TermsEnum Serializer: useful to write and read the terms dictionary for 
sorted and sorted set doc values.
 # I’ll create the new Sub-DocValues format using the previous components.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to