fcofdez opened a new issue, #15713:
URL: https://github.com/apache/lucene/issues/15713

   As visible in the flamegraph, `SortingStoredFieldsConsumer` accounts for a 
non-trivial share of flush time when index sorting is enabled on a segment that 
contains no stored fields. This is due to how sorting works today: a temporary 
stored fields writer is used to write documents in unsorted order, and upon 
flush, that writer is flushed and a reader is created to fetch documents in 
sorted order. Because the temporary writer is configured with 
`maxDocsPerChunk=1`, this translates into one seek + read per processed 
document. This cost is amortized when actual stored fields are present, but 
becomes pure overhead when none are stored.
   
   One way to address this would be to detect when no stored fields have been 
written and take a fast path that writes a single empty sentinel entry rather 
than reading back N documents one by one. This would allow the flush path to 
skip the N seeks + reads while still maintaining the contract that the format 
contains an empty entry per document.
   
   <img width="2983" height="1026" alt="Image" 
src="https://github.com/user-attachments/assets/f9001b83-b164-42a9-9a78-f0e454f02c2f";
 />


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to