Hi,

You describe two separate problems; indexing speed and search issues.

Have you done any cpu profiling to determine where to begin looking for your 
slow indexing speed? It sounds like you're ruled out i/o bottleneck, but it 
could still be a slow database you're reading from. Try simplify your code by 
removing references to merge policies (the default policies should be enough) 
and create new Document/Field instance instead of reusing them. Also, move that 
Optimize and Commit call outside your While loop.

Your search issues is probably due to your use of StandardAnalyzer. It does not 
know the secret meaning of "I-A-05.50" (Product number? Secret identifier?) and 
will tokenize that into "I" and "05.50". The "A" will be skipped as it is a 
default stopword. I have to admit a lack of knowledge regarding 
StandardAnalyzer's use of positional information. You're currently searching 
for the phrase "I 05.50" or "I [anything] 05.50".

Could you provide some example data which you expect to match, but isn't 
returned by your IndexSearcher?

// Simon

-----Original Message-----
From: shane.bump...@ineos.com [mailto:shane.bump...@ineos.com] 
Sent: Monday, May 14, 2012 4:22 PM
To: lucene-net-user@lucene.apache.org
Subject: Question on basic functionality

I've been trying various things to try to make the indexing faster.  I've not 
been able to do successful searches when I don't do an optimize and commit 
after adding each document.  It does return a value, but not all of the values 
I'm expecting.  I've tried moving the commit to the end, which makes it a ton 
faster but I expect to return 4 entries and I only get one. 
 I'm suspecting it has to do with the index being split into segments and 
they're not merged at the end.  Its only indexing 2k records but its taking 
around an hour on my dual core laptop.  I have tried using ramdisk already to 
get rid of the i/o bottleneck if there is one, but it gave about the same 
result.  Any help would be appreciated.

Here's the basic code I'm using to add records:

        Dim dir As New Store.SimpleFSDirectory(New
DirectoryInfo("c:\test"))
        Dim anlz As New
StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29)
        Dim indx As New Index.IndexWriter(dir, anlz, True,
IndexWriter.MaxFieldLength.UNLIMITED)
        Dim mergePolicy As MergePolicy
        Dim logPolicy As LogMergePolicy
        indx.SetUseCompoundFile(True)
        mergePolicy = indx.GetMergePolicy
        logPolicy = mergePolicy
        logPolicy.SetNoCFSRatio(1)
        indx.SetRAMBufferSizeMB(256)
        intC = 0
        Dim doc As New Documents.Document
        While dbrs.EOF = False
           intC = intC + 1
           getNextFile
           If intC = 1 Then
               doc.Add(New Documents.Field("id", intC, 
Documents.Field.Store.YES, Documents.Field.Index.NO))
               doc.Add(New Documents.Field("path", strFile, 
Documents.Field.Store.YES, Documents.Field.Index.NO))
               doc.Add(New Documents.Field("body", strBody, 
Documents.Field.Store.YES, Documents.Field.Index.ANALYZED))
           Else
               doc.GetField("id").SetValue(intC)
               doc.GetField("path").SetValue(strFile)
               doc.GetField("body").SetValue(strBody)
           End If
           indx.AddDocument(doc)
           indx.Optimize(1)
           indx.Commit()
        End While

Here's the code I'm using to search:
                    Dim dir As New Store.SimpleFSDirectory(New
DirectoryInfo("c:\test"))
                    Dim IR As IndexReader = IndexReader.Open(dir, True)
                    Dim anlz As New
StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29)
                    Dim parser As New
QueryParsers.QueryParser(Lucene.Net.Util.Version.LUCENE_29, "body", anlz)
                    Dim query As Search.Query
                    Dim searcher As IndexSearcher
                    Dim resultDocs As TopDocs
                    Dim hits() As ScoreDoc
                    Dim hit As ScoreDoc
                    Dim score As Double
                    Dim i As Integer
                    query = parser.Parse("""I-A-05.50""")
                    searcher = New IndexSearcher(IR)
                    resultDocs = searcher.Search(query, IR.MaxDoc())
                    Console.WriteLine("Found " & resultDocs.TotalHits & " 
results")
                    Dim doc As Documents.Document
                    hits = resultDocs.ScoreDocs
                    For Each hit In hits
                        doc = searcher.Doc(hit.Doc)
                        score = hit.Score
                        Console.WriteLine("Results num: " & i + 1 & " 
score: " & score)
                        Console.WriteLine("ID: " & doc.Get("id"))
                        Console.WriteLine("Path: " & doc.Get("path"))
                    Next
                    searcher.Close()
                    dir.Close()

Reply via email to