It's been a whole hour, you're slowing down..... I promised the original reporter that there would be a JIRA he could track, got one?
Erick On Fri, Jul 11, 2014 at 3:17 PM, Robert Muir <[email protected]> wrote: > You encouraged me to fix it :) > > On Fri, Jul 11, 2014 at 6:09 PM, Erick Erickson <[email protected]> > wrote: >> bq: I think you would probably want to control the number of segments >> with the MapReduceIndexerTool before doing the merge initially >> >> This isn't the nub of the issue. Assuming that the number of segments >> in the index merged in via MRIT is 1 each time, once that index gets >> merged into the live Solr node, the segments don't get merged no >> matter how many times another index is merged. I'm aware of an "In the >> wild" situation where over 6 months, there are over 600 segments. All >> updates were via MRIT. >> >> run MRIT once, 1 segment >> run MRIT a second time, 2 segments >> . >> . >> . >> run MRIT the Nth time, N segments (N > 600 in this case) >> >> So running MRIT N times results in N segments on the Solr node since >> merge _indexes_ doesn't trigger _segment_ merging AFAIK. >> >> This has been masked in the past I'd guess because subsequent >> "regular" indexing via SolrJ, post.jar, whatever _does_ then trigger >> segment merging. But we haven't seen the situation reported before >> where the _only_ way the index gets updated is via index merging. >> Index merging is done via MRIT in this case although this has nothing >> to do with MRIT and everything to do with the core admin mergeindexes >> command. MRIT is only relevant here since it's pretty much the first >> tool that conveniently allowed the only updates to be via >> mergeindexes. >> >> I reproduced this locally without MRIT by just taking a stock Solr, >> copying the index somewhere else, setting mergeFactor=2 then merging >> (and committing) again and again. Stopped at 15 segments or so. Then >> sent a couple of updates up via cURL and the segment count dropped >> back to 2...... >> >> Whether the right place to fix this is Solr core Admin API >> MERGEINDEXES or in the lower-level Lucene call I don't have a strong >> opinion about. >> >> Of course one work-around is to periodically issue an optimize even >> though Uwe cringes every time that gets mentioned ;) >> >> On Fri, Jul 11, 2014 at 2:42 PM, Mark Miller <[email protected]> wrote: >>> I think you would probably want to control the number of segments with the >>> MapReduceIndexerTool before doing the merge initially, and if you find you >>> have too many segments over time as you add more and more data, use a force >>> merge call to reduce the number segments, either manually or scheduled. >>> >>> -- >>> Mark Miller >>> about.me/markrmiller >>> >>> On July 11, 2014 at 4:36:22 PM, Erick Erickson ([email protected]) >>> wrote: >>>> I think I've become aware of an edge case that I'm wondering is worth >>>> a JIRA. Say I have a mergeFactor of 2. Now say I create a bunch of >>>> indexes and add them one by one to the running Solr node via merge >>>> indexes. The mergeFactor appears to be ignored in this scenario. >>>> Indeed, I suspect (without proof) that the entire merge policy is >>>> never referenced at all. >>>> >>>> Historically this hasn't mattered, since merging indexes was >>>> 1> a rare operation >>>> 2> the merge policy _does_ kick in when the index has more documents >>>> added to it via the normal (not merge indexes) policy so things would >>>> be cleaned up. >>>> >>>> All that said, the mapReduceIndexerTool is a scenario where we may be >>>> merging multiple times without every indexing documents any other way. >>>> Seems like the core admin API should trigger the merge policy logic >>>> somehow. The problem here is that the number of segments can grow >>>> without bound. >>>> >>>> Worth a JIRA? >>>> >>>> Erick >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: [email protected] >>>> For additional commands, e-mail: [email protected] >>>> >>>> >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
