[
https://issues.apache.org/jira/browse/LUCENE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072414#comment-14072414
]
Uwe Schindler commented on LUCENE-5843:
---------------------------------------
bq. I'm not sure what IW does today if you create a too-big index but it's
probably horrible; it may succeed and then at search time you hit nasty
exceptions when we overflow int.
If a single segment while merging exceeds the limit, its horrible. If you have
an index that exceeds the limit, you get an Exception when opening:
BaseCompositeReader throws Exception in its ctor:
{code:java}
maxDoc += r.maxDoc(); // compute maxDocs
if (maxDoc < 0 /* overflow */) {
throw new IllegalArgumentException("Too many documents, composite
IndexReaders cannot exceed " + Integer.MAX_VALUE);
}
{code}
The limit is MAX_VALUE, the -1 is just a stupid limitation of TopDocs, but it
is actually smaller, because arrays have a maximum size in Java.
DocIdSetIterators sentinel is not a problem, because its simply the last
document (MAX_VALUE), which is always the last possible one (the iterator is
always exhausted is you reach the last doc).
> IndexWriter should refuse to create an index with more than INT_MAX docs
> ------------------------------------------------------------------------
>
> Key: LUCENE-5843
> URL: https://issues.apache.org/jira/browse/LUCENE-5843
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/index
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Fix For: 5.0, 4.10
>
>
> It's more and more common for users these days to create very large indices,
> e.g. indexing lines from log files, or packets on a network, etc., and it's
> not hard to accidentally exceed the maximum number of documents in one index.
> I think the limit is actually Integer.MAX_VALUE-1 docs, because we use that
> value as a sentinel during searching.
> I'm not sure what IW does today if you create a too-big index but it's
> probably horrible; it may succeed and then at search time you hit nasty
> exceptions when we overflow int.
> I think it should throw an IndexFullException instead. It'd be nice if we
> could do this on the very doc that when added would go over the limit, but I
> would also settle for just throwing at flush as well ... i.e. I think what's
> really important is that the index does not become unusable.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]