fcofdez opened a new issue, #15713: URL: https://github.com/apache/lucene/issues/15713
As visible in the flamegraph, `SortingStoredFieldsConsumer` accounts for a non-trivial share of flush time when index sorting is enabled on a segment that contains no stored fields. This is due to how sorting works today: a temporary stored fields writer is used to write documents in unsorted order, and upon flush, that writer is flushed and a reader is created to fetch documents in sorted order. Because the temporary writer is configured with `maxDocsPerChunk=1`, this translates into one seek + read per processed document. This cost is amortized when actual stored fields are present, but becomes pure overhead when none are stored. One way to address this would be to detect when no stored fields have been written and take a fast path that writes a single empty sentinel entry rather than reading back N documents one by one. This would allow the flush path to skip the N seeks + reads while still maintaining the contract that the format contains an empty entry per document. <img width="2983" height="1026" alt="Image" src="https://github.com/user-attachments/assets/f9001b83-b164-42a9-9a78-f0e454f02c2f" /> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
