[ 
https://issues.apache.org/jira/browse/SOLR-17775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17955998#comment-17955998
 ] 

Yura edited comment on SOLR-17775 at 6/4/25 12:34 AM:
------------------------------------------------------

I think changing DocsStreamer would be much more involved and use far more 
memory, since each document is a LinkedHashMap. This patch is a surgical 
optimization for ValueSource-based fields and uses a compact 
IntObjectHashMap—likely no larger than the DocSlice itself.
The coordinator likely already materializes all documents (plus whatever 
JSON/XML serializers hold), so the extra footprint is minimal. 

ValueSource classes are normally part of the query itself and highly optimized 
for a leap‐frog approach.


was (Author: yura):
I think changing DocsStreamer would be much more involved and use far more 
memory, since each document is a LinkedHashMap. This patch is a surgical 
optimization for ValueSource-based fields and uses a compact 
IntObjectHashMap—likely no larger than the DocSlice itself.
The coordinator likely already materializes all documents (plus whatever 
JSON/XML serializers hold), so the extra footprint is minimal. 

ValueSource classes are normally part of the query itself and highly optimized 
for a leap‐frog approach.

 I’m not sure if classes like Lucene90CompressingStoredFieldsReader are 
optimized for strictly in-order reads—some code still seeks from 0, so it may 
not save work without further changes.

> Optimize ValueSourceAugmenter
> -----------------------------
>
>                 Key: SOLR-17775
>                 URL: https://issues.apache.org/jira/browse/SOLR-17775
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Yura
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> h3. Problem
> ValueSourceAugmenter currently calculates function values on-demand during 
> transform(), performing expensive binary searches and reader lookups for each 
> document individually.
> h3. Solution
> Pre-calculate function values for all result set documents during 
> setContext() by:
>  * Collecting and sorting document IDs from DocList
>  * Sequential iteration through sorted documents to calculate values once per 
> reader segment
>  * Storing results in hash map for O(1) lookup during transform()
>  * Fallback to on-demand calculation for documents outside the pre-calculated 
> set (RTG cases)
> h3. Performance Benefit
> Replaces repeated "find document at position N" operations (binary search per 
> document) with efficient "get next document" iteration (sequential processing 
> within reader segments), significantly reducing lookup overhead.
> h3. Compatibility
> Maintains full backward compatibility through fallback mechanism for edge 
> cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to