You encouraged me to fix it :) On Fri, Jul 11, 2014 at 6:09 PM, Erick Erickson <[email protected]> wrote: > bq: I think you would probably want to control the number of segments > with the MapReduceIndexerTool before doing the merge initially > > This isn't the nub of the issue. Assuming that the number of segments > in the index merged in via MRIT is 1 each time, once that index gets > merged into the live Solr node, the segments don't get merged no > matter how many times another index is merged. I'm aware of an "In the > wild" situation where over 6 months, there are over 600 segments. All > updates were via MRIT. > > run MRIT once, 1 segment > run MRIT a second time, 2 segments > . > . > . > run MRIT the Nth time, N segments (N > 600 in this case) > > So running MRIT N times results in N segments on the Solr node since > merge _indexes_ doesn't trigger _segment_ merging AFAIK. > > This has been masked in the past I'd guess because subsequent > "regular" indexing via SolrJ, post.jar, whatever _does_ then trigger > segment merging. But we haven't seen the situation reported before > where the _only_ way the index gets updated is via index merging. > Index merging is done via MRIT in this case although this has nothing > to do with MRIT and everything to do with the core admin mergeindexes > command. MRIT is only relevant here since it's pretty much the first > tool that conveniently allowed the only updates to be via > mergeindexes. > > I reproduced this locally without MRIT by just taking a stock Solr, > copying the index somewhere else, setting mergeFactor=2 then merging > (and committing) again and again. Stopped at 15 segments or so. Then > sent a couple of updates up via cURL and the segment count dropped > back to 2...... > > Whether the right place to fix this is Solr core Admin API > MERGEINDEXES or in the lower-level Lucene call I don't have a strong > opinion about. > > Of course one work-around is to periodically issue an optimize even > though Uwe cringes every time that gets mentioned ;) > > On Fri, Jul 11, 2014 at 2:42 PM, Mark Miller <[email protected]> wrote: >> I think you would probably want to control the number of segments with the >> MapReduceIndexerTool before doing the merge initially, and if you find you >> have too many segments over time as you add more and more data, use a force >> merge call to reduce the number segments, either manually or scheduled. >> >> -- >> Mark Miller >> about.me/markrmiller >> >> On July 11, 2014 at 4:36:22 PM, Erick Erickson ([email protected]) >> wrote: >>> I think I've become aware of an edge case that I'm wondering is worth >>> a JIRA. Say I have a mergeFactor of 2. Now say I create a bunch of >>> indexes and add them one by one to the running Solr node via merge >>> indexes. The mergeFactor appears to be ignored in this scenario. >>> Indeed, I suspect (without proof) that the entire merge policy is >>> never referenced at all. >>> >>> Historically this hasn't mattered, since merging indexes was >>> 1> a rare operation >>> 2> the merge policy _does_ kick in when the index has more documents >>> added to it via the normal (not merge indexes) policy so things would >>> be cleaned up. >>> >>> All that said, the mapReduceIndexerTool is a scenario where we may be >>> merging multiple times without every indexing documents any other way. >>> Seems like the core admin API should trigger the merge policy logic >>> somehow. The problem here is that the number of segments can grow >>> without bound. >>> >>> Worth a JIRA? >>> >>> Erick >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] >
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
