[jira] [Commented] (LUCENE-8551) Purge unused FieldInfo on segment merge

Adrien Grand (JIRA) Thu, 15 Nov 2018 06:32:10 -0800


    [ 
https://issues.apache.org/jira/browse/LUCENE-8551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16688146#comment-16688146
 ]


Adrien Grand commented on LUCENE-8551:
--------------------------------------

That would have overhead for sure. For instance it's not cheap to know which 
fields are used in stored fields, the only way to do this is to iterate over 
all documents and compute the set of used field names. In contrast merging can 
often copy raw compressed bytes and skip decompressing+decoding entirely.

I'm also a bit worried of the fact that a field could be added back with a 
different number or with different options. For instance in the NRT case that 
means that you could have two consecutive point-in-time views of the same index 
that disagree on the FieldInfo of a field?

> Purge unused FieldInfo on segment merge
> ---------------------------------------
>
>                 Key: LUCENE-8551
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8551
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/index
>            Reporter: David Smiley
>            Priority: Major
>
> If a field is effectively unused (no norms, terms index, term vectors, 
> docValues, stored value, points index), it will nonetheless hang around in 
> FieldInfos indefinitely.  It would be nice to be able to recognize an unused 
> FieldInfo and allow it to disappear after a merge (or two).
> SegmentMerger merges FieldInfo (from each segment) as nearly the first thing 
> it does.  After that, the different index parts, before it's known what's 
> "used" or not.  After writing, we theoretically know which fields are used or 
> not, though we're not doing any bookkeeping to track it.  Maybe we should 
> track the fields used during writing so we write a filtered merged fieldInfo 
> at the end instead of unfiltered up front?  Or perhaps upon reading a 
> segment, we make it cheap/easy for each index type (e.g. terms index, stored 
> fields, ...) to know which fields have data for the corresponding type.  
> Then, on a subsequent merge, we know up front to filter the FieldInfos.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-8551) Purge unused FieldInfo on segment merge

Reply via email to