I agree this inconsistency is bad... and silently losing stuff (float 2.5 becomes int 2) is really bad. We should do something before 4.0.
I would prefer idea 2, i.e. that we never allow changing/promoting a DV type for a given field, and that we do our best to throw clear exc if you do so. I realize this is different from other things in Lucene where "anything goes" but DV is new in 4.0 so we are free to set new rules. Also, if this somehow later proves to be a bad decision, we can always add back in this leniency ... but not vice-versa. Mike McCandless http://blog.mikemccandless.com On Mon, May 28, 2012 at 3:53 PM, Robert Muir <[email protected]> wrote: > Hello, > > Just doing some playing around, i wanted to see what happens if you > changeup a docvalues type across different documents in a single IW > session, e.g. > > case 1: > doc1.add(new IntDocValuesField("foo", 5)) > doc2.add(new FloatDocValuesField("foo", 2.5f)) > > in this case the 2.5f is truncated to an int and becomes a 2 > > case 2: > doc3.add(new StraightBytesDocValuesField("foo", new BytesRef("boo!")) > > in this case you hit an NPE in IntsWriter, because the straightbytes > impl naturally cannot return an intvalue. > > So I'm wondering what we should do? > Currently both merging and multidocvalues do a type-promotion, but if > it happens in the same iw session this won't happen. > > idea 1: throw an exception if the type is changed in one session. this > leaves things a little inconsistent, but prevents strange results. > idea 2: throw an exception if the type is changed *and also on > merge/multidocvalues*. This seems a little cruel (no way to upgrade > your short to int if you need later) but would simplify some code. > (evil) idea 3: force a flush if the type is changed and let merging > take care of it. > idea 4: buffer docvalues in ram in IW instead of inside the codec, in > a "type-independent way" (e.g. sorted hash of the unique byte values + > per-doc ords). this is a lot of work, but would make the codec side of > DV simpler as it just does encode/decode and wouldnt have to do ram > accounting or deal with types changing or any of that. > > any other ideas? > > > -- > lucidimagination.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
