[
https://issues.apache.org/jira/browse/LUCENENET-600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16497674#comment-16497674
]
Jens Melgaard commented on LUCENENET-600:
-----------------------------------------
Another option would be to use ANTLR4 to generate a parser. I wanted to add
that information because I have been looking for an ANTLR4 grammar for Lucene
for ages and it was rather difficult to find.
In the end I stumbled on [https://github.com/lrowe/lucenequery]
I have been struggling to integrate it into the Visual Studio + .Net Standard
tool chain, but managed to get it working well enough to at least be able to
build it, however it certainly not a pleasant development experience (yet)...
According to [https://github.com/tunnelvisionlabs/antlr4cs] it should work
better if I bothered installing java etc to do the generation etc... But I
didn't bother...
Project example can be seen here:
[https://github.com/dotJEM/json-index/tree/Lucene-v4.8/DotJEM.Json.Index/DotJEM.Json.Index.QueryParsers]
(The scope here is quite a bit broader than just a Query parser, and it is only
partially inspired by the lrowe grammar, the main idea was just to get
something more simple to work to begin with)
> Creating an IndexWriter with a RAMDirectory causes two exceptions to be thrown
> ------------------------------------------------------------------------------
>
> Key: LUCENENET-600
> URL: https://issues.apache.org/jira/browse/LUCENENET-600
> Project: Lucene.Net
> Issue Type: Bug
> Components: Lucene.Net Core
> Affects Versions: Lucene.Net 4.8.0
> Reporter: Howard van Rooijen
> Priority: Minor
>
> I have a document scoring algorithm built on top of Lucene. I've just
> upgraded it to the 4.8.0-beta00005 packages (great job by the way).
> We essentially create an in memory index for a single document in order to do
> some parsing / processing / scoring / classification.
> I noticed while running our test suite that the CPU was spiking and also
> noticed that a large number of first chance exceptions were being generated
> by these two lines of code:
> {{var directory = new RAMDirectory();}}
> {{var indexWriter = new IndexWriter(directory, new
> IndexWriterConfig(LuceneVersion.LUCENE_48, new
> ScorableDocumentAnalyzer(LuceneVersion.LUCENE_48)));}}
> The first exception is:
> {{'System.IO.FileNotFoundException' in Lucene.Net.dll ("segments.gen"). }}
> The second exception is:
> {{'Lucene.Net.Index.IndexNotFoundException' in Lucene.Net.dll ("no segments*
> file found in RAMDirectory@21af1a5
> lockFactory=Lucene.Net.Store.SingleInstanceLockFactory:}}
> Based on reading / research, I believer this is because the RAMDirectory is
> initialised to be null, and when the IndexWriter is created it tries to query
> the RAMDirectory and FileNotFoundException is thrown.
> Is it possible to either initialized as empty rather than null - i.e. reading
> the directory would not throw an exception - this might involve trying to add
> an "segments.gen" entry and a matching "segments_n" segmentinfo entry,
> alternatively is it possible not to throw an exception in this use case?
> Or do you have a suggestion for how it would be possible to manually
> initialise the RAMDirectory before passing it to the IndexWriter?
> Because these two lines are being called per request - we're seeing 2
> exceptions per request - this seems like an expensive way of initialising an
> IndexWriter. We've already had to replace QueryParser with SimpleQueryParser
> because QueryParser was throwing 50+ exception internally when being
> instantiated.
> If anyone can point me in the right direction, I'd be more than happy to try
> and create a fix / PR. But I'm wondering as RAMDirectory is often used for
> unit testing scenarios - does anyone have any deep knowledge about why this
> current behaviour is the default behaviour?
> Many Thanks,
> Howard
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)