[ https://issues.apache.org/jira/browse/SOLR-17775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17955998#comment-17955998 ]
Yura edited comment on SOLR-17775 at 6/4/25 12:34 AM: ------------------------------------------------------ I think changing DocsStreamer would be much more involved and use far more memory, since each document is a LinkedHashMap. This patch is a surgical optimization for ValueSource-based fields and uses a compact IntObjectHashMap—likely no larger than the DocSlice itself. The coordinator likely already materializes all documents (plus whatever JSON/XML serializers hold), so the extra footprint is minimal. ValueSource classes are normally part of the query itself and highly optimized for a leap‐frog approach. was (Author: yura): I think changing DocsStreamer would be much more involved and use far more memory, since each document is a LinkedHashMap. This patch is a surgical optimization for ValueSource-based fields and uses a compact IntObjectHashMap—likely no larger than the DocSlice itself. The coordinator likely already materializes all documents (plus whatever JSON/XML serializers hold), so the extra footprint is minimal. ValueSource classes are normally part of the query itself and highly optimized for a leap‐frog approach. I’m not sure if classes like Lucene90CompressingStoredFieldsReader are optimized for strictly in-order reads—some code still seeks from 0, so it may not save work without further changes. > Optimize ValueSourceAugmenter > ----------------------------- > > Key: SOLR-17775 > URL: https://issues.apache.org/jira/browse/SOLR-17775 > Project: Solr > Issue Type: Improvement > Components: search > Reporter: Yura > Priority: Minor > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > h3. Problem > ValueSourceAugmenter currently calculates function values on-demand during > transform(), performing expensive binary searches and reader lookups for each > document individually. > h3. Solution > Pre-calculate function values for all result set documents during > setContext() by: > * Collecting and sorting document IDs from DocList > * Sequential iteration through sorted documents to calculate values once per > reader segment > * Storing results in hash map for O(1) lookup during transform() > * Fallback to on-demand calculation for documents outside the pre-calculated > set (RTG cases) > h3. Performance Benefit > Replaces repeated "find document at position N" operations (binary search per > document) with efficient "get next document" iteration (sequential processing > within reader segments), significantly reducing lookup overhead. > h3. Compatibility > Maintains full backward compatibility through fallback mechanism for edge > cases. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org