On 9/14/06, David Balmain <[EMAIL PROTECTED]> wrote: > On 9/14/06, Neville Burnell <[EMAIL PROTECTED]> wrote: > > Hi David, > > > > > Deleted documents don't get deleted until commit is called > > > > Ok, but FYI, my experiments show that #commit doesn't affect #doc_count, > > even across ruby sessions. > > Sorry, I guess I wan't very clear on that point. The deletes don't get > commited until commit is called which is why I don't have a num_docs > method in IndexWriter to because there is no way to reliably tell > until commit is called. IndexWriter#doc_count is like > IndexReader#max_doc. It tells you how many documents there are in the > index, deleted or not. > > > On a different note, I'd like to request a variation of #add_document > > which returns the doc_id of the document added, as opposed to self. > > > > I'm trying to track down an issue with a large test index [600MB, 500k > > docs] in which I need to update a document. The old document is deleted > > then added again, but doesn't show up in my searches. > > > > A #doc_count on the writer before and after #add_document shows that the > > index is 1 document larger, but I still cant #search for the updated > > doc. > > > > What do you think about having #add_document "yield" the doc_id if > > block_given? > > > > Neville > > How about just using the doc_count method. Call it after you add the > document and subtract one and you'll have the document ID of the last > document added. Don't call it before you add the document as a merge > might happen when you add the document, possibly changing all document > IDs when deletes are completely removed. > > Cheers, > Dave >
I should also mention the reason I wouldn't want to return the document ID from any IndexWriter method is that the document ID could become invalid when the next document is added (if a segment merge is triggered and deletes exist). At least when using an IndexReader, the document ID is valid for the life of the reader. _______________________________________________ Ferret-talk mailing list [email protected] http://rubyforge.org/mailman/listinfo/ferret-talk

