On 9/14/06, David Balmain <[EMAIL PROTECTED]> wrote:
> On 9/14/06, Neville Burnell <[EMAIL PROTECTED]> wrote:
> > Hi David,
> >
> > > Deleted documents don't get deleted until commit is called
> >
> > Ok, but FYI, my experiments show that #commit doesn't affect #doc_count,
> > even across ruby sessions.
>
> Sorry, I guess I wan't very clear on that point. The deletes don't get
> commited until commit is called which is why I don't have a num_docs
> method in IndexWriter to because there is no way to reliably tell
> until commit is called. IndexWriter#doc_count is like
> IndexReader#max_doc. It tells you how many documents there are in the
> index, deleted or not.
>
> > On a different note, I'd like to request a variation of #add_document
> > which returns the doc_id of the document added, as opposed to self.
> >
> > I'm trying to track down an issue with a large test index [600MB, 500k
> > docs] in which I need to update a document. The old document is deleted
> > then added again, but doesn't show up in my searches.
> >
> > A #doc_count on the writer before and after #add_document shows that the
> > index is 1 document larger, but I still cant #search for the updated
> > doc.
> >
> > What do you think about having #add_document "yield" the doc_id if
> > block_given?
> >
> > Neville
>
> How about just using the doc_count method. Call it after you add the
> document and subtract one and you'll have the document ID of the last
> document added. Don't call it before you add the document as a merge
> might happen when you add the document, possibly changing all document
> IDs when deletes are completely removed.
>
> Cheers,
> Dave
>

I should also mention the reason I wouldn't want to return the
document ID from any IndexWriter method is that the document ID could
become invalid when the next document is added (if a segment merge is
triggered and deletes exist). At least when using an IndexReader, the
document ID is valid for the life of the reader.
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Reply via email to