Adrien Grand created LUCENE-10217:
-------------------------------------

             Summary: BufferedUpdates is memory inefficient
                 Key: LUCENE-10217
                 URL: https://issues.apache.org/jira/browse/LUCENE-10217
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Adrien Grand


I recently got a question from [~David Turner] about why {{IndexWriter}} was 
flushing data so frequently despite very small documents. After investigating, 
we noticed that most of the RAM buffer was actually spent on BufferedUpdates 
since his test was using {{IndexWriter#updateDocument}}. This is not surprising 
given that BufferedUpdates accounts BYTES_PER_DEL_TERM=160 bytes per update, 
plus the length of the field and the length of the term, so often around 200 
bytes only to record the updated term.

As a comparison, Lucene's nightly NYC taxis benchmark only needs 286 bytes per 
document in the RAM buffer for about 20 fields, 
(http://people.apache.org/~mikemccand/lucenebench/sparseResults.html#index_docs_per_mb_ram),
 or ~15 bytes per field.

Updates are expected to be slower than appending given that they need to look 
up terms in the dictionary, but I suspect that this memory inefficiency is 
making updates even slower by forcing Lucene to flush its RAM buffer much more 
frequently than it has to when purely appending documents.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to