Thanks, my mistake, I see it now: > LUCENE-2846: omitNorms now behaves like omitTermFrequencyAndPositions, if you > omitNorms(true) for field "a" for 1000 documents, but then add a document with > omitNorms(false) for field "a", all documents for field "a" will have no norms. > Previously, Lucene would fill the first 1000 documents with "fake norms" from > Similarity.getDefault(). (Robert Muir, Mike Mccandless)
I somehow interpreted wrongly the comment in the code "remains off for life" - expecting the old behavior and reading "off" as "not ommitted" where actually "off" stands here for "ommitted", well, all clear now, thanks! Doron On Thu, May 26, 2011 at 3:26 PM, Shai Erera <[email protected]> wrote: > Sorry Doron, I opened LUCENE-3146 to track this and forgot to update this > thread. > > Mike already commented that this is expected behavior in 4.0 (semantics > were flipped) however we still need to fix some jdocs + there seems to be > another problem that app may succeed to setNorm, only for that norm be > discarded on the next merge. > > Shai > > > On Thu, May 26, 2011 at 3:11 PM, Doron Cohen <[email protected]> wrote: > >> Yes I see this too in trunk r1127436 and it seems a bug. >> If you uncomment the line that adds the field with NO_NORMS the file is >> there as expected. >> >> I think I know where the bug is: >> FieldInfo.update() has the wrong logic here: >> >> {code} >> if (this.omitNorms != omitNorms) { >> this.omitNorms = true; // if one require omitNorms >> at least once, it remains off for life >> } >> {code} >> >> It should of course be changed to set false in this case. >> >> Doron >> >> >> On Thu, May 26, 2011 at 11:32 AM, Shai Erera <[email protected]> wrote: >> >>> Hi >>> >>> I wrote the following test: >>> >>> {code} >>> public void testConfusingNorms() throws Exception { >>> Directory dir = newDirectory(); >>> LogMergePolicy lmp = newLogMergePolicy(false); >>> IndexWriterConfig conf = newIndexWriterConfig(TEST_VERSION_CURRENT, >>> new MockAnalyzer(random)).setMergePolicy(lmp); >>> IndexWriter w = new IndexWriter(dir, conf); >>> Document doc = new Document(); >>> doc.add(new Field("c", "some text", Store.YES, Index.ANALYZED)); >>> w.addDocument(doc); >>> doc = new Document(); >>> doc.add(new Field("c", "delete", Store.NO, >>> Index.NOT_ANALYZED_NO_NORMS)); >>> w.addDocument(doc); >>> w.close(); >>> >>> IndexReader r = IndexReader.open(dir, false); >>> r.setNorm(0, "c", (byte) 1); >>> r.close(); >>> >>> // Look for the sep norms file >>> boolean found = false; >>> for (String s : dir.listAll()) { >>> if (IndexFileNames.isSeparateNormsFile(s)) { >>> found = true; >>> break; >>> } >>> } >>> assertTrue("separate norms file not found", found); >>> >>> dir.close(); >>> } >>> {code} >>> >>> You will also need to add that method to IndexFileNames (not committed >>> yet): >>> {code} >>> /** >>> * Returns true if the given filename ends with the separate norms file >>> * pattern: {@code SEPARATE_NORMS_EXTENSION + "[0-9]+"}. >>> */ >>> public static boolean isSeparateNormsFile(String filename) { >>> int idx = filename.lastIndexOf('.'); >>> if (idx == -1) return false; >>> String ext = filename.substring(idx + 1); >>> return Pattern.matches(SEPARATE_NORMS_EXTENSION + "[0-9]+", ext); >>> } >>> {code} >>> >>> The test adds two documents with a field "c", one analyzed and one not >>> and also no norms. According to "NOT_ANALYZED_NO_NORMS": >>> >>> Note that once you index a given field *with* norms enabled, disabling >>>> norms will have no effect. >>>> In other words, for this to have the above described effect on a field, >>>> all instances of that field >>>> must be indexed with NOT_ANALYZED_NO_NORMS from the beginning. >>>> >>> >>> I'd expect that since I add one instance of the field w/ norms enabled, >>> then norms will exist for that field, however that's not the case. >>> >>> The code which sets the norms by IndexReader does not do anything, >>> because SegmentReader.doSetNorms thinks this is not an indexed field (or >>> assuming the documentation is wrong, a field w/o norms): >>> >>> protected void doSetNorm(int doc, String field, byte value) throws >>> IOException { >>> SegmentNorms norm = norms.get(field); >>> if (norm == null) // not an indexed field >>> return; >>> >>> The same test runs fine on 3x, so I assume there is a bug in the code >>> somewhere only on trunk? >>> >>> Shai >>> >> >> >
