[
https://issues.apache.org/jira/browse/LUCENE-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732974#action_12732974
]
Shai Erera commented on LUCENE-1750:
------------------------------------
What happens after several such large segments are created? Wouldn't you want
them to be merged into an even larger segment? Or, you'll have many such
segments and search performance will degrade.
I guess I never thought this is a problem. If I have enough disk space, and my
index size reaches 600 GB (which is a huge index), and is split across 10
different segments of size 60GB each, I guess I'd want them to be merged into
one larger 600GB segment. It will take ions until I'll accumulate another such
600 GB segment, no?
Maybe we can have two merge factors: 1) for small segments, or up to a set size
threshold, where we do the merges regularly. 2) Then, for really large segments
we say the marge factor is different. For example, we can say that up to 1GB
the merge factor is 10, and beyond the merge factor is 20. That will postpone
the large IO merges until enough such segments accumulate.
Also, w/ the current proposal, how will optimize work? Will it skip the very
large segments, or will they be included too?
> Create a MergePolicy that limits the maximum size of it's segments
> ------------------------------------------------------------------
>
> Key: LUCENE-1750
> URL: https://issues.apache.org/jira/browse/LUCENE-1750
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Affects Versions: 2.4.1
> Reporter: Jason Rutherglen
> Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-1750.patch
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> Basically I'm trying to create largish 2-4GB shards using
> LogByteSizeMergePolicy, however I've found in the attached unit
> test segments that exceed maxMergeMB.
> The goal is for segments to be merged up to 2GB, then all
> merging to that segment stops, and then another 2GB segment is
> created. This helps when replicating in Solr where if a single
> optimized 60GB segment is created, the machine stops working due
> to IO and CPU starvation.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]