bq: I think you would probably want to control the number of segments with the MapReduceIndexerTool before doing the merge initially
This isn't the nub of the issue. Assuming that the number of segments in the index merged in via MRIT is 1 each time, once that index gets merged into the live Solr node, the segments don't get merged no matter how many times another index is merged. I'm aware of an "In the wild" situation where over 6 months, there are over 600 segments. All updates were via MRIT. run MRIT once, 1 segment run MRIT a second time, 2 segments . . . run MRIT the Nth time, N segments (N > 600 in this case) So running MRIT N times results in N segments on the Solr node since merge _indexes_ doesn't trigger _segment_ merging AFAIK. This has been masked in the past I'd guess because subsequent "regular" indexing via SolrJ, post.jar, whatever _does_ then trigger segment merging. But we haven't seen the situation reported before where the _only_ way the index gets updated is via index merging. Index merging is done via MRIT in this case although this has nothing to do with MRIT and everything to do with the core admin mergeindexes command. MRIT is only relevant here since it's pretty much the first tool that conveniently allowed the only updates to be via mergeindexes. I reproduced this locally without MRIT by just taking a stock Solr, copying the index somewhere else, setting mergeFactor=2 then merging (and committing) again and again. Stopped at 15 segments or so. Then sent a couple of updates up via cURL and the segment count dropped back to 2...... Whether the right place to fix this is Solr core Admin API MERGEINDEXES or in the lower-level Lucene call I don't have a strong opinion about. Of course one work-around is to periodically issue an optimize even though Uwe cringes every time that gets mentioned ;) On Fri, Jul 11, 2014 at 2:42 PM, Mark Miller <[email protected]> wrote: > I think you would probably want to control the number of segments with the > MapReduceIndexerTool before doing the merge initially, and if you find you > have too many segments over time as you add more and more data, use a force > merge call to reduce the number segments, either manually or scheduled. > > -- > Mark Miller > about.me/markrmiller > > On July 11, 2014 at 4:36:22 PM, Erick Erickson ([email protected]) > wrote: >> I think I've become aware of an edge case that I'm wondering is worth >> a JIRA. Say I have a mergeFactor of 2. Now say I create a bunch of >> indexes and add them one by one to the running Solr node via merge >> indexes. The mergeFactor appears to be ignored in this scenario. >> Indeed, I suspect (without proof) that the entire merge policy is >> never referenced at all. >> >> Historically this hasn't mattered, since merging indexes was >> 1> a rare operation >> 2> the merge policy _does_ kick in when the index has more documents >> added to it via the normal (not merge indexes) policy so things would >> be cleaned up. >> >> All that said, the mapReduceIndexerTool is a scenario where we may be >> merging multiple times without every indexing documents any other way. >> Seems like the core admin API should trigger the merge policy logic >> somehow. The problem here is that the number of segments can grow >> without bound. >> >> Worth a JIRA? >> >> Erick >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
