[
https://issues.apache.org/jira/browse/SOLR-17942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Puneet Ahuja updated SOLR-17942:
--------------------------------
Description:
The parameter ramPerThreadHardLimitMB cannot be larger than 2GB in Lucene,
which means a single thread cannot write segments larger than 2GB.
Refer:
[https://lucene.apache.org/core/9_9_0/core/org/apache/lucene/index/IndexWriterConfig.html#setRAMPerThreadHardLimitMB(int])
This issue proposes to make this parameter configurable above the 2GB limit, so
that each thread can write a bigger segment. I plan to use reflection to bypass
this hard-coded limit in Lucene.
When indexing high dimensional vector data, each segment has its own HNSW
graph. So more segments mean more graphs to search per shard and more graph
rebuild work during merges. With this change, a single indexing thread can
flush fewer, and larger segments, which is generally more resource-efficient
for vector-heavy workloads.
was:
The parameter ramPerThreadHardLimitMB cannot be larger than 2GB in Lucene,
which means a single thread cannot write segments larger than 2GB.
Refer:
[https://lucene.apache.org/core/9_9_0/core/org/apache/lucene/index/IndexWriterConfig.html#setRAMPerThreadHardLimitMB(int])
This issue proposes to make this parameter configurable above the 2GB limit, so
that each thread can write a bigger segment. I plan to use reflection to bypass
this hard-coded limit in Lucene.
When indexing high dimensional vector data, each
> Raising the hardcoded limit of lucene parameter ramPerThreadHardLimitMB using
> reflection
> ----------------------------------------------------------------------------------------
>
> Key: SOLR-17942
> URL: https://issues.apache.org/jira/browse/SOLR-17942
> Project: Solr
> Issue Type: Task
> Affects Versions: main (10.0)
> Reporter: Puneet Ahuja
> Priority: Major
>
> The parameter ramPerThreadHardLimitMB cannot be larger than 2GB in Lucene,
> which means a single thread cannot write segments larger than 2GB.
> Refer:
> [https://lucene.apache.org/core/9_9_0/core/org/apache/lucene/index/IndexWriterConfig.html#setRAMPerThreadHardLimitMB(int])
> This issue proposes to make this parameter configurable above the 2GB limit,
> so that each thread can write a bigger segment. I plan to use reflection to
> bypass this hard-coded limit in Lucene.
>
> When indexing high dimensional vector data, each segment has its own HNSW
> graph. So more segments mean more graphs to search per shard and more graph
> rebuild work during merges. With this change, a single indexing thread can
> flush fewer, and larger segments, which is generally more resource-efficient
> for vector-heavy workloads.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]