https://issues.apache.org/jira/browse/LUCENE-5672
: Date: Fri, 11 Jul 2014 15:22:40 -0700 : From: Erick Erickson <[email protected]> : Reply-To: [email protected] : To: [email protected] : Subject: Re: Core admin merge indexes, should it trigger merge policy? : : It's been a whole hour, you're slowing down..... : : I promised the original reporter that there would be a JIRA he could : track, got one? : : Erick : : On Fri, Jul 11, 2014 at 3:17 PM, Robert Muir <[email protected]> wrote: : > You encouraged me to fix it :) : > : > On Fri, Jul 11, 2014 at 6:09 PM, Erick Erickson <[email protected]> wrote: : >> bq: I think you would probably want to control the number of segments : >> with the MapReduceIndexerTool before doing the merge initially : >> : >> This isn't the nub of the issue. Assuming that the number of segments : >> in the index merged in via MRIT is 1 each time, once that index gets : >> merged into the live Solr node, the segments don't get merged no : >> matter how many times another index is merged. I'm aware of an "In the : >> wild" situation where over 6 months, there are over 600 segments. All : >> updates were via MRIT. : >> : >> run MRIT once, 1 segment : >> run MRIT a second time, 2 segments : >> . : >> . : >> . : >> run MRIT the Nth time, N segments (N > 600 in this case) : >> : >> So running MRIT N times results in N segments on the Solr node since : >> merge _indexes_ doesn't trigger _segment_ merging AFAIK. : >> : >> This has been masked in the past I'd guess because subsequent : >> "regular" indexing via SolrJ, post.jar, whatever _does_ then trigger : >> segment merging. But we haven't seen the situation reported before : >> where the _only_ way the index gets updated is via index merging. : >> Index merging is done via MRIT in this case although this has nothing : >> to do with MRIT and everything to do with the core admin mergeindexes : >> command. MRIT is only relevant here since it's pretty much the first : >> tool that conveniently allowed the only updates to be via : >> mergeindexes. : >> : >> I reproduced this locally without MRIT by just taking a stock Solr, : >> copying the index somewhere else, setting mergeFactor=2 then merging : >> (and committing) again and again. Stopped at 15 segments or so. Then : >> sent a couple of updates up via cURL and the segment count dropped : >> back to 2...... : >> : >> Whether the right place to fix this is Solr core Admin API : >> MERGEINDEXES or in the lower-level Lucene call I don't have a strong : >> opinion about. : >> : >> Of course one work-around is to periodically issue an optimize even : >> though Uwe cringes every time that gets mentioned ;) : >> : >> On Fri, Jul 11, 2014 at 2:42 PM, Mark Miller <[email protected]> wrote: : >>> I think you would probably want to control the number of segments with the MapReduceIndexerTool before doing the merge initially, and if you find you have too many segments over time as you add more and more data, use a force merge call to reduce the number segments, either manually or scheduled. : >>> : >>> -- : >>> Mark Miller : >>> about.me/markrmiller : >>> : >>> On July 11, 2014 at 4:36:22 PM, Erick Erickson ([email protected]) wrote: : >>>> I think I've become aware of an edge case that I'm wondering is worth : >>>> a JIRA. Say I have a mergeFactor of 2. Now say I create a bunch of : >>>> indexes and add them one by one to the running Solr node via merge : >>>> indexes. The mergeFactor appears to be ignored in this scenario. : >>>> Indeed, I suspect (without proof) that the entire merge policy is : >>>> never referenced at all. : >>>> : >>>> Historically this hasn't mattered, since merging indexes was : >>>> 1> a rare operation : >>>> 2> the merge policy _does_ kick in when the index has more documents : >>>> added to it via the normal (not merge indexes) policy so things would : >>>> be cleaned up. : >>>> : >>>> All that said, the mapReduceIndexerTool is a scenario where we may be : >>>> merging multiple times without every indexing documents any other way. : >>>> Seems like the core admin API should trigger the merge policy logic : >>>> somehow. The problem here is that the number of segments can grow : >>>> without bound. : >>>> : >>>> Worth a JIRA? : >>>> : >>>> Erick : >>>> : >>>> --------------------------------------------------------------------- : >>>> To unsubscribe, e-mail: [email protected] : >>>> For additional commands, e-mail: [email protected] : >>>> : >>>> : >>> : >>> : >>> --------------------------------------------------------------------- : >>> To unsubscribe, e-mail: [email protected] : >>> For additional commands, e-mail: [email protected] : >>> : >> : >> --------------------------------------------------------------------- : >> To unsubscribe, e-mail: [email protected] : >> For additional commands, e-mail: [email protected] : >> : > : > --------------------------------------------------------------------- : > To unsubscribe, e-mail: [email protected] : > For additional commands, e-mail: [email protected] : > : : --------------------------------------------------------------------- : To unsubscribe, e-mail: [email protected] : For additional commands, e-mail: [email protected] : : -Hoss http://www.lucidworks.com/ --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
