[
https://issues.apache.org/jira/browse/LUCENE-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13977133#comment-13977133
]
Shai Erera commented on LUCENE-5618:
------------------------------------
bq. I think its a design flaw in how this stuff is written
Today we pass only the FIs the Codec should "care" about, rather than pass all
the FIs it "knows" about. This allows the Codec to optimize. E.g. today we
don't take advantage of that, and a DVP reads all the metadata of a field, even
if the field isn't passed in the FIS (and therefore will never be asked for).
If we write each field in its own gen, then since we don't allow adding new
fields through dvUpdates, for gen=-1 we just pass all known dvFieldInfos, and
for gen > 0 we will pass a single FI only, therefore the Codec always receives
the FIs it knows about, even though for gen=-1 it is given some FIs it
shouldn't care about. Our Codecs only read metadata into memory, the actual
data is loaded lazily, so perhaps optimizing them is less important at the
moment.
I wish we could be more flexible though in our code. It feels odd to me that
each field is written in its own gen, just because we cannot add a FIS.exists()
check in the Codec. Like, if we always pass all DV FIS to every DVP, each will
be able to do the exists() check, but a DVP will see fields it doesn't know
about. Is that bad? It still covers the corruption case of a bad field number
being encoded in the first place...
> DocValues updates send wrong fieldinfos to codec producers
> ----------------------------------------------------------
>
> Key: LUCENE-5618
> URL: https://issues.apache.org/jira/browse/LUCENE-5618
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Robert Muir
>
> Spinoff from LUCENE-5616.
> See the example there, docvalues readers get a fieldinfos, but it doesn't
> contain the correct ones, so they have invalid field numbers at read time.
> This should really be fixed. Maybe a simple solution is to not write
> "batches" of fields in updates but just have only one field per gen?
> This removes many-many relationships and would make things easy to understand.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]