Hi~

We are trying to build an OLAP database based on lucene, and we heavily use 
lucene's DocValues (as our column store).

We try to use DocValues to store the array type field. For example, if we want 
to store the field1 and feild2 in this json document into DocValues 
respectively, SORTED_NUMERIC and SORTED_SET seem to be our only option.

{
    "field1": [ 3, 1, 1, 2 ],
    "field2": [ "c", "a", "a", "b" ]
}


When we store field1 in SORTED_NUMERIC and field2 in SORTED_SET, we will get 
this result:

[Community Verified icon]

field1:

  *   origin: [3, 1, 1, 2]
  *   in SORTED_NUMERIC: [1, 1, 2, 3]

field2:

  *   origin: [”c”, “a”, “a”, “b” ]
  *   in SORTED_SET: ords [0, 1, 2] terms [”a”, “b”, “c”]

The original ordering relationship of the elements in the array is lost.

We're guessing that lucene's DocValues are designed primarily for sorting and 
aggregation, so the original order of elements may not matter.

But in our usage scene, it is important to keep the original order of the 
elements in the array (we allow user to access the elements in the array using 
the subscript operator).

We wonder if lucene has plans to add new types of DocValues that can store 
arrays and keep the original order of elements in the array?

Thanks!

Reply via email to