On 9/14/06, Neville Burnell <[EMAIL PROTECTED]> wrote:
> I'm playing with "updating" docs in my index, and I think I've found bug
> with IndexWriter counting deleted docs. Script and output follow:
>
> =====
> require 'rubygems'
> require 'ferret'
>
> p Ferret::VERSION
>
> @doc = {:id => '44', :name => 'fred', :email => '[EMAIL PROTECTED]'}
>
> @dir = Ferret::Store::RAMDirectory.new
>
> def add_then_delete_fred
> @writer = Ferret::Index::IndexWriter.new(:dir => @dir)
>
> p "adding doc :[EMAIL PROTECTED]:id]}"
> @writer << @doc
> p "[EMAIL PROTECTED]"
>
> p "deleting doc :[EMAIL PROTECTED]:id]}"
> @writer.delete(:id, @doc[:id])
> p "[EMAIL PROTECTED]"
>
> @writer.commit
> @writer.close
> @writer = nil
> end
>
> add_then_delete_fred
> add_then_delete_fred
> add_then_delete_fred
>
> @reader = Ferret::Index::IndexReader.new(@dir)
> p "reader [EMAIL PROTECTED]"
>
> @writer = Ferret::Index::IndexWriter.new(:dir => @dir)
> p "writer [EMAIL PROTECTED]"
>
> ===
>
> $>ruby test_delete.rb
> "0.10.4"
> "adding doc :id=44"
> "doc_count=1"
> "deleting doc :id=44"
> "doc_count=1"
> "adding doc :id=44"
> "doc_count=2"
> "deleting doc :id=44"
> "doc_count=2"
> "adding doc :id=44"
> "doc_count=3"
> "deleting doc :id=44"
> "doc_count=3"
> "reader count=0"
> "writer count=3"
Hi Neville,
Unfortunately this is the way it has to work. Deleted documents don't
get deleted until commit is called so there is no way to reliable tell
how many undeleted documents exist in the index from the IndexWriter.
It's a performance thing. I should change IndexWriter#doc_count to
IndexWriter#max_doc to be consistant with IndexReader.
Cheers,
Dave
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk