[
https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated LUCENE-1516:
---------------------------------------
Attachment: ssd2.png
OK using the last patch, I ran another near real-time test, using this
alg:
{code}
analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
doc.maker=org.apache.lucene.benchmark.byTask.feeds.LineDocMaker
merge.policy=org.apache.lucene.index.LogDocMergePolicy
docs.file=/Volumes/External/lucene/wiki.txt
doc.stored = false
doc.term.vector = false
doc.add.log.step=10
max.field.length=2147483647
directory=FSDirectory
autocommit=false
compound=false
merge.factor = 10
ram.flush.mb = 128
doc.maker.forever = false
doc.random.id.limit = 3204040
work.dir=/lucene/work
{ "BuildIndex"
- OpenIndex
- NearRealtimeReader(1)
{ "UpdateDocs" UpdateDoc > : 100000 : 50/sec
- CloseIndex
}
RepSumByPrefRound BuildIndex
{code}
It opens a full (3.2M docs, previously built) wikipedia index, then
randomly selects a doc and updates it (deletes old, adds new) at the
rate of 50 docs/sec. Then, once per second I open a new reader, do
the same search (term "1", sorted by date).
I attached another graph (ssd2.png) with the results, showing reopen &
search time as a function of how many updates have been done; rough
comments:
* Search time is pretty constant ~35 msec, except occassional
glitches where it goes as high as ~340 msec. Net/net very
reasonable I think.
* Search time is remarkably non-noisy, except for occasional
spikes.
* Reopen time is also fast (~ 40 msec) but is more noisy.
* It's not clear the merges are really impacting things that much.
It could simply be that I didn't run test for long enough for a
big merge to run. Also, this index has no stored fields nor term
vectors, so if we added those, merges would get slower.
* This is a better test than last one, since it's doing some deletes
* Since I open writer with autoCommit false, and near-realtime
carries all pending deletes in RAM, no *.del file ever gets
written to the index
> Integrate IndexReader with IndexWriter
> ---------------------------------------
>
> Key: LUCENE-1516
> URL: https://issues.apache.org/jira/browse/LUCENE-1516
> Project: Lucene - Java
> Issue Type: Improvement
> Affects Versions: 2.4
> Reporter: Jason Rutherglen
> Assignee: Michael McCandless
> Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch,
> LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch,
> LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch,
> LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch,
> LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch,
> LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch,
> LUCENE-1516.patch, LUCENE-1516.patch, magnetic.png, ssd.png, ssd2.png
>
> Original Estimate: 672h
> Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates.
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]