[jira] Commented: (LUCENE-1750) Create a MergePolicy that limits the maximum size of it's segments

Shai Erera (JIRA) Sat, 18 Jul 2009 22:50:41 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732974#action_12732974
 ]


Shai Erera commented on LUCENE-1750:
------------------------------------

What happens after several such large segments are created? Wouldn't you want 
them to be merged into an even larger segment? Or, you'll have many such 
segments and search performance will degrade.

I guess I never thought this is a problem. If I have enough disk space, and my 
index size reaches 600 GB (which is a huge index), and is split across 10 
different segments of size 60GB each, I guess I'd want them to be merged into 
one larger 600GB segment. It will take ions until I'll accumulate another such 
600 GB segment, no?

Maybe we can have two merge factors: 1) for small segments, or up to a set size 
threshold, where we do the merges regularly. 2) Then, for really large segments 
we say the marge factor is different. For example, we can say that up to 1GB 
the merge factor is 10, and beyond the merge factor is 20. That will postpone 
the large IO merges until enough such segments accumulate.

Also, w/ the current proposal, how will optimize work? Will it skip the very 
large segments, or will they be included too?

> Create a MergePolicy that limits the maximum size of it's segments
> ------------------------------------------------------------------
>
>                 Key: LUCENE-1750
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1750
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.4.1
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 3.1
>
>         Attachments: LUCENE-1750.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Basically I'm trying to create largish 2-4GB shards using
> LogByteSizeMergePolicy, however I've found in the attached unit
> test segments that exceed maxMergeMB.
> The goal is for segments to be merged up to 2GB, then all
> merging to that segment stops, and then another 2GB segment is
> created. This helps when replicating in Solr where if a single
> optimized 60GB segment is created, the machine stops working due
> to IO and CPU starvation. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-1750) Create a MergePolicy that limits the maximum size of it's segments

Reply via email to