[ 
https://issues.apache.org/jira/browse/LUCENE-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722006#action_12722006
 ] 

Shai Erera commented on LUCENE-1705:
------------------------------------

My search app has such a scenario, and currently we just delete all the 
documents given a certain criteria (something similar to the above 
MatchAllDocsQuery. But I actually think that's the wrong approach. If you want 
to delete all the documents from the index, you'd better create a new one. The 
main reason is that if your index has, say, 10M documents, a deleteAll() will 
keep those 10M in the index, and when you'll re-index, the index size will be 
doubled. Worth still, the deleted documents may belong to segments which will 
not be merged/optimized right away (depends on your mergeFactor setting), and 
therefore will stick around for a long time (until you call optimize() or 
expungeDeletes()).

But, creating a new IndexWriter right away, while overriding the current one is 
not so smart, because your users will be left w/ no search results until the 
index has accumulated enough documents. Therefore I think the solution for such 
an approach should be:
# Call writer.rollback() - abort all current operations, cancel everything 
until the last commit.
# Create a new IndexWriter in a new directory and re-index everything.
# In the meantime, all your search operations go against the current index, 
which you know is not going to change until the other one is re-built, and 
therefore you can also optimize things, by opening an IndexReader and stop any 
accounting your code may do - just leave it open.
# When re-indexing has complete, sync all your code and:
#* Define your workDir to be the new index dir. That way new searches can begin 
right away on the index index)
#* Safely delete the old index dir (probably need to do something here to 
ensure no readers are open against this dir etc.).

That's a high-level description and I realize it may have some holes here and 
there, but you get the point.

If we were to create a deleteAll() method, I'd expect it to work that way. 
I.e., the solution you proposed above (write a new segments file referencing no 
segments) would prevent all searches until something new is actually re-indexed 
right?

I have to admit though, that I don't have an idea yet on how it can be done 
inside Lucene, such that new readers will see the old segments, while when I 
finish re-indexing and call commit, the previous segments will just be deleted.

A wild shot (and then I'll go to sleep on it) - how about if you re-index 
everything, not committing during that time at all. Readers that are open 
against the current directory will see all the documents, EXCEPT the new ones 
you're adding (same for new readers that you may open). When you're done 
re-indexing, you'll call a commitNewOnly, which will create an empty segments 
file and then call commit. That way, assuming you're using 
KeepOnlyLastCommitDeletionPolicy, after the existing readers will close, any 
new reader that will be opened will see the new segments only, and the next 
time you commit, the old segments will be deleted.

That will move the deleteAll() method to the application side, since it knows 
when it can safely delete all the current segments. If you don't have such a 
requirement (keeping an index for searches until re-indexing is complete), then 
I think you can safely close() the index and re-create it?

> Add deleteAllDocuments() method to IndexWriter
> ----------------------------------------------
>
>                 Key: LUCENE-1705
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1705
>             Project: Lucene - Java
>          Issue Type: Wish
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Tim Smith
>
> Ideally, there would be a deleteAllDocuments() or clear() method on the 
> IndexWriter
> This method should have the same performance and characteristics as:
> * currentWriter.close()
> * currentWriter = new IndexWriter(..., create=true,...)
> This would greatly optimize a delete all documents case. Using 
> deleteDocuments(new MatchAllDocsQuery()) could be expensive given a large 
> existing index.
> IndexWriter.deleteAllDocuments() should have the same semantics as a 
> commit(), as far as index visibility goes (new IndexReader opening would get 
> the empty index)
> I see this was previously asked for in LUCENE-932, however it would be nice 
> to finally see this added such that the IndexWriter would not need to be 
> closed to perform the "clear" as this seems to be the general recommendation 
> for working with an IndexWriter now
> deleteAllDocuments() method should:
> * abort any background merges (they are pointless once a deleteAll has been 
> received)
> * write new segments file referencing no segments
> This method would remove one of the final reasons i would ever need to close 
> an IndexWriter and reopen a new one 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to