It's been a whole hour, you're slowing down.....

I promised the original reporter that there would be a JIRA he could
track, got one?

Erick

On Fri, Jul 11, 2014 at 3:17 PM, Robert Muir <[email protected]> wrote:
> You encouraged me to fix it :)
>
> On Fri, Jul 11, 2014 at 6:09 PM, Erick Erickson <[email protected]> 
> wrote:
>> bq: I think you would probably want to control the number of segments
>> with the MapReduceIndexerTool before doing the merge initially
>>
>> This isn't the nub of the issue. Assuming that the number of segments
>> in the index merged in via MRIT is 1 each time, once that index gets
>> merged into the live Solr node, the segments don't get merged no
>> matter how many times another index is merged. I'm aware of an "In the
>> wild" situation where over 6 months, there are over 600 segments. All
>> updates were via MRIT.
>>
>> run MRIT once, 1 segment
>> run MRIT a second time, 2 segments
>> .
>> .
>> .
>> run MRIT the Nth time, N segments (N > 600 in  this case)
>>
>> So running MRIT N times results in N segments on the Solr node since
>> merge _indexes_ doesn't trigger _segment_ merging AFAIK.
>>
>> This has been masked in the past I'd guess because subsequent
>> "regular" indexing via SolrJ, post.jar, whatever _does_ then trigger
>> segment merging. But we haven't seen the situation reported before
>> where the _only_ way the index gets updated is via index merging.
>> Index merging is done via MRIT in this case although this has nothing
>> to do with MRIT and everything to do with the core admin mergeindexes
>> command. MRIT is only relevant here since it's pretty much the first
>> tool that conveniently allowed the only updates to be via
>> mergeindexes.
>>
>> I reproduced this locally without MRIT by just taking a stock Solr,
>> copying the index somewhere else, setting mergeFactor=2 then merging
>> (and committing) again and again. Stopped at 15 segments or so. Then
>> sent a couple of updates up via cURL and the segment count dropped
>> back to 2......
>>
>> Whether the right place to fix this is Solr core Admin API
>> MERGEINDEXES or in the lower-level Lucene call I don't have a strong
>> opinion about.
>>
>> Of course one work-around is to periodically issue an optimize even
>> though Uwe cringes every time that gets mentioned ;)
>>
>> On Fri, Jul 11, 2014 at 2:42 PM, Mark Miller <[email protected]> wrote:
>>> I think you would probably want to control the number of segments with the 
>>> MapReduceIndexerTool before doing the merge initially, and if you find you 
>>> have too many segments over time as you add more and more data, use a force 
>>> merge call to reduce the number segments, either manually or scheduled.
>>>
>>> --
>>> Mark Miller
>>> about.me/markrmiller
>>>
>>> On July 11, 2014 at 4:36:22 PM, Erick Erickson ([email protected]) 
>>> wrote:
>>>> I think I've become aware of an edge case that I'm wondering is worth
>>>> a JIRA. Say I have a mergeFactor of 2. Now say I create a bunch of
>>>> indexes and add them one by one to the running Solr node via merge
>>>> indexes. The mergeFactor appears to be ignored in this scenario.
>>>> Indeed, I suspect (without proof) that the entire merge policy is
>>>> never referenced at all.
>>>>
>>>> Historically this hasn't mattered, since merging indexes was
>>>> 1> a rare operation
>>>> 2> the merge policy _does_ kick in when the index has more documents
>>>> added to it via the normal (not merge indexes) policy so things would
>>>> be cleaned up.
>>>>
>>>> All that said, the mapReduceIndexerTool is a scenario where we may be
>>>> merging multiple times without every indexing documents any other way.
>>>> Seems like the core admin API should trigger the merge policy logic
>>>> somehow. The problem here is that the number of segments can grow
>>>> without bound.
>>>>
>>>> Worth a JIRA?
>>>>
>>>> Erick
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [email protected]
>>>> For additional commands, e-mail: [email protected]
>>>>
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to