[ https://issues.apache.org/jira/browse/LUCENE-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531510 ]
Yonik Seeley commented on LUCENE-1012: -------------------------------------- > We could just fix the javadocs to match the current approach? That sounds like the right approach. > Problems with maxMergeDocs parameter > ------------------------------------ > > Key: LUCENE-1012 > URL: https://issues.apache.org/jira/browse/LUCENE-1012 > Project: Lucene - Java > Issue Type: Bug > Components: Index > Reporter: Michael Busch > Priority: Minor > Fix For: 2.3 > > > I found two possible problems regarding IndexWriter's maxMergeDocs value. I'm > using the following code to test maxMergeDocs: > {code:java} > public void testMaxMergeDocs() throws IOException { > final int maxMergeDocs = 50; > final int numSegments = 40; > > MockRAMDirectory dir = new MockRAMDirectory(); > IndexWriter writer = new IndexWriter(dir, new WhitespaceAnalyzer(), > true); > writer.setMergePolicy(new LogDocMergePolicy()); > writer.setMaxMergeDocs(maxMergeDocs); > Document doc = new Document(); > doc.add(new Field("field", "aaa", Field.Store.YES, Field.Index.TOKENIZED, > Field.TermVector.WITH_POSITIONS_OFFSETS)); > for (int i = 0; i < numSegments * maxMergeDocs; i++) { > writer.addDocument(doc); > //writer.flush(); // uncomment to avoid the DocumentsWriter bug > } > writer.close(); > > new SegmentInfos.FindSegmentsFile(dir) { > protected Object doBody(String segmentFileName) throws > CorruptIndexException, IOException { > SegmentInfos infos = new SegmentInfos(); > infos.read(directory, segmentFileName); > for (int i = 0; i < infos.size(); i++) { > assertTrue(infos.info(i).docCount <= maxMergeDocs); > } > return null; > } > }.run(); > } > {code} > > - It seems that DocumentsWriter does not obey the maxMergeDocs parameter. If > I don't flush manually, then the index only contains one segment at the end > and the test fails. > - If I flush manually after each addDocument() call, then the index contains > more segments. But still, there are segments that contain more docs than > maxMergeDocs, e. g. 55 vs. 50. The javadoc in IndexWriter says: > {code:java} > /** > * Returns the largest number of documents allowed in a > * single segment. > * > * @see #setMaxMergeDocs > */ > public int getMaxMergeDocs() { > return getLogDocMergePolicy().getMaxMergeDocs(); > } > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]