[ https://issues.apache.org/jira/browse/LUCENE-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531433 ]
Michael McCandless commented on LUCENE-994: ------------------------------------------- > While trying Solr with the latest Lucene, I ran into this > back-incompatibility: > Caused by: java.lang.IllegalArgumentException: this method can only be called > when the merge policy is LogDocMergePolicy > at > org.apache.lucene.index.IndexWriter.getLogDocMergePolicy(IndexWriter.java:316) > at org.apache.lucene.index.IndexWriter.setMaxMergeDocs(IndexWriter.java:768) > > It's not an issue at all for Solr - we'll fix things up when we > officially upgrade Lucene versions, but it does seem like it might > affect a number of apps that try and just drop in a new lucene > jar. Thoughts? Hmm, good catch. This should only happen when "setMaxMergeDocs" is called (this is the only method that requires a LogDocMergePolicy). I think we have various options: 1. Leave things as is and put up-front comment in the release saying you could either switch to LogDocMergePolicy, or, use "setMaxMergeMB" on the default LogByteSizeMergePolicy, instead. Also put details in the javadocs for this method explaining these options. 2. Switch back to LogDocMergePolicy by default "out of the box". 3. If setMaxMergeDocs() is called, switch back to LogDocMergePolicy "on-demand". 4. Modify LogByteSizeMergePolicy to in fact accept both "maxMergeDocs" or "maxMergeMB", allowing either one or both just like "flush by RAM" and/or "flush by doc count" is being done in LUCENE-1007. I think I like option 4 the best. 3 seems to magical (violates "principle of least surprise"). 2 I think is bad because it's best to match the merge policy with how we are flushing (by RAM by default). 1 is clearly disruptive to people who want to drop Lucene JAR in and test. I'll open a new issue. Thanks Yonik! > Change defaults in IndexWriter to maximize "out of the box" performance > ----------------------------------------------------------------------- > > Key: LUCENE-994 > URL: https://issues.apache.org/jira/browse/LUCENE-994 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Affects Versions: 2.3 > Reporter: Michael McCandless > Assignee: Michael McCandless > Priority: Minor > Fix For: 2.3 > > Attachments: LUCENE-994.patch, writerinfo.zip > > > This is follow-through from LUCENE-845, LUCENE-847 and LUCENE-870; > I'll commit this once those three are committed. > Out of the box performance of IndexWriter is maximized when flushing > by RAM instead of a fixed document count (the default today) because > documents can vary greatly in size. > Likewise, merging performance should be faster when merging by net > segment size since, to minimize the net IO cost of merging segments > over time, you want to merge segments of equal byte size. > Finally, ConcurrentMergeScheduler improves indexing speed > substantially (25% in a simple initial test in LUCENE-870) because it > runs the merges in the backround and doesn't block > add/update/deleteDocument calls. Most machines have concurrency > between CPU and IO and so it makes sense to default to this > MergeScheduler. > Note that these changes will break users of ParallelReader because the > parallel indices will no longer have matching docIDs. Such users need > to switch IndexWriter back to flushing by doc count, and switch the > MergePolicy back to LogDocMergePolicy. It's likely also necessary to > switch the MergeScheduler back to SerialMergeScheduler to ensure > deterministic docID assignment. > I think the combination of these three default changes, plus other > performance improvements for indexing (LUCENE-966, LUCENE-843, > LUCENE-963, LUCENE-969, LUCENE-871, etc.) should make for some sizable > performance gains Lucene 2.3! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]