You encouraged me to fix it :)

On Fri, Jul 11, 2014 at 6:09 PM, Erick Erickson <[email protected]> wrote:
> bq: I think you would probably want to control the number of segments
> with the MapReduceIndexerTool before doing the merge initially
>
> This isn't the nub of the issue. Assuming that the number of segments
> in the index merged in via MRIT is 1 each time, once that index gets
> merged into the live Solr node, the segments don't get merged no
> matter how many times another index is merged. I'm aware of an "In the
> wild" situation where over 6 months, there are over 600 segments. All
> updates were via MRIT.
>
> run MRIT once, 1 segment
> run MRIT a second time, 2 segments
> .
> .
> .
> run MRIT the Nth time, N segments (N > 600 in  this case)
>
> So running MRIT N times results in N segments on the Solr node since
> merge _indexes_ doesn't trigger _segment_ merging AFAIK.
>
> This has been masked in the past I'd guess because subsequent
> "regular" indexing via SolrJ, post.jar, whatever _does_ then trigger
> segment merging. But we haven't seen the situation reported before
> where the _only_ way the index gets updated is via index merging.
> Index merging is done via MRIT in this case although this has nothing
> to do with MRIT and everything to do with the core admin mergeindexes
> command. MRIT is only relevant here since it's pretty much the first
> tool that conveniently allowed the only updates to be via
> mergeindexes.
>
> I reproduced this locally without MRIT by just taking a stock Solr,
> copying the index somewhere else, setting mergeFactor=2 then merging
> (and committing) again and again. Stopped at 15 segments or so. Then
> sent a couple of updates up via cURL and the segment count dropped
> back to 2......
>
> Whether the right place to fix this is Solr core Admin API
> MERGEINDEXES or in the lower-level Lucene call I don't have a strong
> opinion about.
>
> Of course one work-around is to periodically issue an optimize even
> though Uwe cringes every time that gets mentioned ;)
>
> On Fri, Jul 11, 2014 at 2:42 PM, Mark Miller <[email protected]> wrote:
>> I think you would probably want to control the number of segments with the 
>> MapReduceIndexerTool before doing the merge initially, and if you find you 
>> have too many segments over time as you add more and more data, use a force 
>> merge call to reduce the number segments, either manually or scheduled.
>>
>> --
>> Mark Miller
>> about.me/markrmiller
>>
>> On July 11, 2014 at 4:36:22 PM, Erick Erickson ([email protected]) 
>> wrote:
>>> I think I've become aware of an edge case that I'm wondering is worth
>>> a JIRA. Say I have a mergeFactor of 2. Now say I create a bunch of
>>> indexes and add them one by one to the running Solr node via merge
>>> indexes. The mergeFactor appears to be ignored in this scenario.
>>> Indeed, I suspect (without proof) that the entire merge policy is
>>> never referenced at all.
>>>
>>> Historically this hasn't mattered, since merging indexes was
>>> 1> a rare operation
>>> 2> the merge policy _does_ kick in when the index has more documents
>>> added to it via the normal (not merge indexes) policy so things would
>>> be cleaned up.
>>>
>>> All that said, the mapReduceIndexerTool is a scenario where we may be
>>> merging multiple times without every indexing documents any other way.
>>> Seems like the core admin API should trigger the merge policy logic
>>> somehow. The problem here is that the number of segments can grow
>>> without bound.
>>>
>>> Worth a JIRA?
>>>
>>> Erick
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to