[jira] [Commented] (LUCENE-5618) DocValues updates send wrong fieldinfos to codec producers

Shai Erera (JIRA) Tue, 22 Apr 2014 11:13:30 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13977133#comment-13977133
 ]


Shai Erera commented on LUCENE-5618:
------------------------------------

bq. I think its a design flaw in how this stuff is written

Today we pass only the FIs the Codec should "care" about, rather than pass all 
the FIs it "knows" about. This allows the Codec to optimize. E.g. today we 
don't take advantage of that, and a DVP reads all the metadata of a field, even 
if the field isn't passed in the FIS (and therefore will never be asked for).

If we write each field in its own gen, then since we don't allow adding new 
fields through dvUpdates, for gen=-1 we just pass all known dvFieldInfos, and 
for gen > 0 we will pass a single FI only, therefore the Codec always receives 
the FIs it knows about, even though for gen=-1 it is given some FIs it 
shouldn't care about. Our Codecs only read metadata into memory, the actual 
data is loaded lazily, so perhaps optimizing them is less important at the 
moment.

I wish we could be more flexible though in our code. It feels odd to me that 
each field is written in its own gen, just because we cannot add a FIS.exists() 
check in the Codec. Like, if we always pass all DV FIS to every DVP, each will 
be able to do the exists() check, but a DVP will see fields it doesn't know 
about. Is that bad? It still covers the corruption case of a bad field number 
being encoded in the first place...

> DocValues updates send wrong fieldinfos to codec producers
> ----------------------------------------------------------
>
>                 Key: LUCENE-5618
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5618
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Robert Muir
>
> Spinoff from LUCENE-5616.
> See the example there, docvalues readers get a fieldinfos, but it doesn't 
> contain the correct ones, so they have invalid field numbers at read time.
> This should really be fixed. Maybe a simple solution is to not write 
> "batches" of fields in updates but just have only one field per gen? 
> This removes many-many relationships and would make things easy to understand.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-5618) DocValues updates send wrong fieldinfos to codec producers

Reply via email to