[
https://issues.apache.org/jira/browse/LUCENE-8321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17037035#comment-17037035
]
Erick Erickson commented on LUCENE-8321:
----------------------------------------
Part of the rabbit hole would be the number of segments. TMP has a default
segment size cap of 5G for instance. We could certainly up that or create a new
merge policy for indexes with lots of docs...
On a separate note I've seen instances of terabyte-scale indexes on disk.
Allowing that to grow by a factor of 8 would be another part of the rabbit hole.
That said, I'm not against the idea at all. I'm pretty sure operational issues
would pop out, but that's progress...
> Allow composite readers to have more than 2B documents
> ------------------------------------------------------
>
> Key: LUCENE-8321
> URL: https://issues.apache.org/jira/browse/LUCENE-8321
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Priority: Minor
>
> I would like to start discussing removing the limit of ~2B documents that we
> have for indices, while still enforcing it at the segment level for practical
> reasons.
> Postings, stored fields, and all other codec APIs would keep working on
> integers to represent doc ids. Only top-level doc ids and numbers of
> documents would need to move to a long. I say "only" because we now mostly
> consume indices per-segment, but there is still a number of places where we
> identify documents by their top-level doc ID like {{IndexReader#document}},
> top-docs collectors, etc.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]