[jira] [Commented] (LUCENE-4627) Migration layer for facets

Shai Erera (JIRA) Mon, 21 Jan 2013 03:22:17 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-4627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558696#comment-13558696
 ]


Shai Erera commented on LUCENE-4627:
------------------------------------

So, LUCENE-4602 went out without the migration layer. Rather, it provides 
FacetsPayloadMigrationReader which do a one-time migration of payload data to 
DocValues.

The DOCS_ONLY case doesn't require migration per se. It is ok for old indexes 
to retain the existing drill-down terms' posting lists with positions, as this 
information won't be accessed during search. New documents will be added w/ 
DOCS_ONLY, and thus positions will be removed from old segments, as they are 
merged.

The other side of the migration story, is migrating how facets are encoded in 
the DocValues. E.g. today they are encoded using dgap+vint. This can be solved 
in two ways, both by the user:

* Provide a CategoryListDataMigratingReader, which will read the data using the 
old IntDecoder and encode using the new IntEncoder. If required, e.g. if we'll 
come up with a better encoder in LUCENE-4609, we can provide such utility.

* The user can keep the data encoded as-is, and write a special 
CategoryListParams which returns a CategoryListIterator, whose setNextReader 
loads the appropriate IntDecoder for that reader. Since the reader provides 
information about which version of Lucene created it, it should be an easy task.
** Hmmm ... but as soon as an old and new segments are merged, the data in the 
DocValues will be mixed, so that's a viable solution only as long as segments 
aren't merged.

I think that if we'll introduce another index break for facets, we should 
revisit this. For now, I don't think that much is needed to be done here.
                
> Migration layer for facets
> --------------------------
>
>                 Key: LUCENE-4627
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4627
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Shai Erera
>
> Spin-off from LUCENE-4602 (and LUCNE-4623). It will be good if we can develop 
> some migration layer so that users don't need to re-index their content when 
> we change how facets are written in the index. Currently the two open issues 
> are cut over to DV and index drill-down terms w/ DOCS_ONLY, but in the future 
> there could be other changes.
> I don't think that this layer needs to be very heavy. Something in the form 
> of a FacetsAtomicReaderWrapper. For instance, to support the DV migration, we 
> can implement a PayloadFacetsAtomicReader which translates the payload to DV 
> API (i.e. its docValues() API will actually read from the payload).
> We'd need some API on IW I think to initialize that reader, so that data can 
> be migrated from payload to DV during segment merges.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4627) Migration layer for facets

Reply via email to