Re: This may be a bug

Patric Forsgard Wed, 21 Nov 2012 22:51:38 -0800

Hi.

Will you have the same problem if you create the Directory with default
lock factory? If I understand correctly you are creating a writer from the
directory and later the reader, adding documents to the writer (that will
use a directory that is without lock factory) will not be reflected in
searchable before new reader have been created so that is probably correct
that it not is reflected directly.


// Patric


On 21 November 2012 21:29, Gerry Suggitt <[email protected]>wrote:

> Yes, I am using NRT. (Or I should say, I am trying to!). I commit within
> 10 seconds after a document is added (When a document arrives I start a
> timer to allow more documents to come in before making the commitment).
>
> And before I get into more details of the bug I reported, may I ask you a
> question about NRT?
>
> ---------------------------------------------------------------------------------------------------------------------------------------------------------
> start of NRT question
>
> According to the documentation (at least as it is described for Java), NRT
> should allow "updates to be efficiently searched hopefully within
> milliseconds after an update is complete". I have found that after an
> update, the document is not found until I have performed a commit.
>
> Here is the code that creates the reader, writer and searcher: _flusher is
> the 10 second commit timer.
>     public void Start()
>     {
>         _logger.Info( ()=> "LuceneEngine.Start " + _pathname );
>         System.IO.DirectoryInfo dir = new System.IO.DirectoryInfo(
> _pathname );
>         _directory = FSDirectory.Open( dir, new NoLockFactory() ); //
> nolock is OK for now because we have a single thread accessing the directory
>         _analyzer = new PerFieldAnalyzerWrapper( new WhitespaceAnalyzer()
> );
>         _writer = new IndexWriter( _directory, _analyzer,
> IndexWriter.MaxFieldLength.UNLIMITED );
>         _logger.Info( ()=> "LuceneEngine.Start begin Optimize" );
>         _writer.Optimize();
>         _logger.Info( ()=> "LuceneEngine.Start Optimize finished" );
>         _reader = _writer.GetReader();
>         _searcher = new IndexSearcher( _reader );
>         _flusher = new _PeriodicDocFlusher( 10000, _maxDocsInCache,
> _OnFlushTimer, _logger );
>         _logger.Info( ()=> "LuceneEngine.Start " + _pathname + " - up and
> running" );
>     }
> And uses it: _DocAdded() just starts the commit timer if is it not already
> running.
>
>     public void UpdateDocument( string id, Document doc )
>     {
>         _writer.UpdateDocument( new Term("id", id), doc, _analyzer );
>         _DocAdded();
>     }
>
>     public TopDocs Search( Query query, int maxDocsToReturn )
>     {
>         lock( this )
>         {
>             return _searcher.Search( query, maxDocsToReturn );
>         }
>     }
> To test it, after making a call to UpdateDocument with id = xxx, I looped
> making repeated calls to Search where the query was id:xxx. This
> continually returned 0 documents until the timer kicked in and performed
> the commit. And then Search returned 1 hit.
>
> So is this expected? I didn't think so, but maybe I just misinterpreted
> the documentation.
>
> -----------------------------------------------------------------------------------------------------------------------------------------------------
> end of NRT question
>
> Back to the issue at hand ...
>
> I tried to reproduce the problem as I described and was unable to. So I am
> uncertain what was happening there.
>
> But I have some more information about the lost databases on our test
> machines.
>
> When I say the databases were completely empty, the directory actually
> held two files:
> segments.gen
> segments_1
>
> On one machine we restored the data from an external backup (actually a
> SQL database!) and everything worked fine from then on. We could see
> several files in the database directory.
>
> The other lab machine was untouched and here we discovered something that
> might be important.
>
> We noticed on the first machine, after restoring (which essentially
> performed a series of _writer.UpdateDocument) after we stopped and started
> the Lucene service, the timestamp on the segments.gen had changed and we
> now had a file segments_2. (I know, I know, you are going "well, duh", but
> hold on a sec)
>
> On the second machine we had not touched it. And the time on the
> segments.gen file was November 12, 12:31 PM.
>
> But the reboot of the machines occurred on November 17 at 3:30 PM.
>
> So why wasn't the timestamp updated? My guess: Because there were no index
> files in the directory!
>
> But ... I have logs that show 500 documents being added successfully to
> the database AFTER November 12, 12:31 PM. And these logs show commits being
> performed.
>
> Furthermore, searches are returning documents.
>
> So it appears (and this is just my guess) that the commits were making the
> necessary updates to the in-memory data structures that allowed searches to
> work, but the data was never saved to the disk. No exception occurred which
> may have been thrown as a result of a failure to write to the disk, so at
> this point I am baffled.
>
> Now why the data was not saved to the disk last week but are being saved
> this week is beyond me.
>
> I know we don't have much to work with. I will continue to see if I can
> reproduce the problem. If there is anything else you would like me to
> check, please ask.
>
> Thanks - Gerry
>
>
>
>
> ----- Original Message -----
>   From: Simon Svensson
>   To: [email protected]
>   Cc: Gerry Suggitt
>   Sent: Wednesday, November 21, 2012 3:05 AM
>   Subject: Re: This may be a bug
>
>
>   Hi,
>
>   This does indeed sound serious. Are you saying that you have a snapshot
>   (with committed documents) that is cleared when calling
>   IndexWriter.Optimize? Can you share it for reproduction purposes?
>
>   Are you using near-realtime indexing? What you describe could happen if
>   you were using nrt, and never called IndexWriter.Commit. The index would
>   indeed be cleared next time an writer is opened against the directory, a
>   step in clearing out unused index files. A kind of rollback of
>   non-commited changes.
>
>   // Simon
>
>
>   On 2012-11-20 16:45, Gerry Suggitt wrote:
>   > Sorry to send this email directly to the developers, but I couldn't
> see any other way of entering a defect.
>   >
>   > My name is Gerry Suggitt and I work for Leafsprout Technologies, a
> company that creates products for the Medical Information sector.
>   >
>   > We have created a Master Patient Index using Lucene that works very
> well - we are able to perform fuzzy matching and all the nice things that
> you want in a MPI.
>   >
>   > But something terrible just happened. Fortunately this occured in our
> own lab - we have not yet released the product to the field.
>   >
>   > Sometime over the weekend, the computers holding the Lucene database
> rebooted (probably from a Windows upgrade). All of the Lucene databases
> were blown away! Completely empty!
>   >
>   > Recently, I had noticed the same thing when I was doing some testing,
> so it may be related.
>   >
>   > We are currently using version 2.9.4.1
>   >
>   > What I was doing in my testing was taking a snapshot of the Lucene
> database files (just a copy to another directory). I would run some tests
> which would affect the database, so before continuing I would copy the
> snapshot back.
>   >
>   > When I started the Lucene service, the database was blown away!
> Completely empty!
>   >
>   > I was able to determine what was doing this. At startup, I was
> performing an optimize. This seems like a good time for me: At startup we
> know no client is making demands on the system. When I commented out the
> call to optimize, the database remained intact up startup.
>   >
>   > The systems that lost their databases still had the call to optimize
> in them.
>   >
>   > Please help!
>   >
>
>

Re: This may be a bug

Reply via email to