Howard van Rooijen created LUCENENET-600:
--------------------------------------------

             Summary: Creating an IndexWriter with a RamDirectory causes two 
exceptions to be thrown
                 Key: LUCENENET-600
                 URL: https://issues.apache.org/jira/browse/LUCENENET-600
             Project: Lucene.Net
          Issue Type: Bug
          Components: Lucene.Net Core
    Affects Versions: Lucene.Net 4.8.0
            Reporter: Howard van Rooijen


I have a document scoring algorithm built on top of Lucene. I've just upgraded 
it to the 4.8.0-beta00005 packages (great job by the way).

We essentially create an in memory index for a single document in order to do 
some parsing / processing / scoring / classification.

I noticed while running our test suite that the CPU was spiking and also 
noticed that a large number of first chance exceptions were being generated by 
these two lines of code:

{{var directory = new RAMDirectory();}}
{{var indexWriter = new IndexWriter(directory, new 
IndexWriterConfig(LuceneVersion.LUCENE_48, new 
ScorableDocumentAnalyzer(LuceneVersion.LUCENE_48)));}}

The first exception is:

{{'System.IO.FileNotFoundException' in Lucene.Net.dll ("segments.gen"). }}

The second exception is:

{{'Lucene.Net.Index.IndexNotFoundException' in Lucene.Net.dll ("no segments* 
file found in RAMDirectory@21af1a5 
lockFactory=Lucene.Net.Store.SingleInstanceLockFactory:}}

Based on reading / research, I believer this is because the RAMDirectory is 
initialised to be null, and when the IndexWriter is created it tries to query 
the RAMDirectory and FileNotFoundException is thrown.

Is it possible to either initialized as empty rather than null - i.e. reading 
the directory would not throw an exception - this might involve trying to add 
an "segments.gen" entry and a matching "segments_n" segmentinfo entry, 
alternatively is it possible not to throw an exception in this use case? 

Or do you have a suggestion for how it would be possible to manually initialise 
the RAMDirectory before passing it to the IndexWriter?

Because these two lines are being called per request - we're seeing 2 
exceptions per request - this seems like an expensive way of initialising an 
IndexWriter. We've already had to replace QueryParser with SimpleQueryParser 
because QueryParser was throwing 50+ exception internally when being 
instantiated.

If anyone can point me in the right direction, I'd be more than happy to try 
and create a fix / PR. But I'm wondering as RAMDirectory is often used for unit 
testing scenarios - does anyone have any deep knowledge about why this current 
behaviour is the default behaviour? 

Many Thanks,

Howard

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to