sam-herman commented on issue #15420:
URL: https://github.com/apache/lucene/issues/15420#issuecomment-3530905602

   > Striking the right balance between write amplification and merging costs 
is key. While I think the typical tier-merging system might not work best for 
graph structures, it doesn't mean we should abandon it and attempt to merge in 
teeny segments into a single giant structure. Finding an appropriate balance is 
what you need.
   
   Thanks for the feedback — and to clarify, the goal of this RFC is not to 
encourage more frequent merges or to replace Lucene’s tiered merge policy. The 
motivation behind exposing a `RandomAccessWriter`-like primitive is to enable a 
path toward optional in-place updatable graph segments, which would 
significantly improve the ability to handle streaming inserts and deletes for 
vector/graph fields. This is motivated by real-world workloads similar to what 
we outline in the OpenSearch JVector proposal 
(https://github.com/opensearch-project/opensearch-jvector/issues/169).
   
   What we’re proposing is essentially a staged, opt-in evolution of the codec 
and segment APIs — not a change to Lucene’s core merge strategy:
   
   **Phase 1** — Leading-segment incremental merging
   
   The initial step only enables the largest graph segment to incrementally 
append new nodes, reducing the cost of constantly rebuilding full graph 
structures on every merge. This doesn’t require changing merge frequency; it 
simply avoids unnecessary rewrite work in graph-heavy fields.
   
   **Phase 2** — Per-field segment policies
   
   Similar to today’s per-field formats, this phase would allow a field (like a 
graph/vector field) to opt into a different segment-lifespan policy without 
affecting the rest of the index. This isolates the approach and keeps Lucene’s 
tiered merging intact for all other structures.
   
   **Phase 3** — Optional in-place updatable graph segments
   
   This is the long-term goal: providing an interface that allows graph-based 
fields to apply small deltas in-place and persist them without requiring a full 
segment rewrite. Our own on-disk representations already support this pattern 
efficiently, and exposing these lower-level primitives in Lucene would allow 
experimentation — both within Lucene and by external codecs — without 
committing Lucene to a particular implementation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to