I have no examples prepared, but they can be easily created as questions occur.
Here's a very simple example that creates an in-memory index of three
documents then reports the result of several searches. When run from the
command-line it this is the result:
C:\>vb001
Query for cyan found 2 hits
color set 1
color set 2
Query for red but not green found 1 hits
color set 3
Query for red or blue or magenta found 3 hits
color set 3
color set 2
color set 1
---------------------------------------------------------------
Here's the program:
Imports Lucene.Net.Documents
Imports Lucene.Net.Index
Imports Lucene.Net.Search
Module Module1
Sub Main()
REM -- Create a simple in-memory index with three documents
REM -- each document has name and color fields.
Dim index As Lucene.Net.Store.RAMDirectory = New
Lucene.Net.Store.RAMDirectory()
Dim analyzer As Lucene.Net.Analysis.Standard.StandardAnalyzer = New
Lucene.Net.Analysis.Standard.StandardAnalyzer()
Dim writer As Lucene.Net.Index.IndexWriter = New
Lucene.Net.Index.IndexWriter(index, analyzer, True,
Lucene.Net.Index.IndexWriter.MaxFieldLength.UNLIMITED)
Dim doc As Lucene.Net.Documents.Document
doc = New Lucene.Net.Documents.Document()
doc.Add(New Field("color", "red cyan green", Field.Store.YES,
Field.Index.TOKENIZED))
doc.Add(New Field("name", "color set 1", Field.Store.YES,
Field.Index.TOKENIZED))
writer.AddDocument(doc)
doc = New Lucene.Net.Documents.Document()
doc.Add(New Field("color", "cyan yellow magenta", Field.Store.YES,
Field.Index.TOKENIZED))
doc.Add(New Field("name", "color set 2", Field.Store.YES,
Field.Index.TOKENIZED))
writer.AddDocument(doc)
doc = New Lucene.Net.Documents.Document()
doc.Add(New Field("color", "blue yellow red", Field.Store.YES,
Field.Index.TOKENIZED))
doc.Add(New Field("name", "color set 3", Field.Store.YES,
Field.Index.TOKENIZED))
writer.AddDocument(doc)
writer.Commit()
writer.Close()
REM ------------- Search the index
Dim ixSearcher As IndexSearcher = New IndexSearcher(index)
Dim qryParse As Lucene.Net.QueryParsers.QueryParser = New
Lucene.Net.QueryParsers.QueryParser("color", analyzer)
Dim testQry As Query
Dim hits As Hits
testQry = qryParse.Parse("cyan")
hits = ixSearcher.Search(testQry)
Console.WriteLine("Query for cyan found " + hits.Length().ToString() +
" hits")
Dim hitIterator As HitIterator = hits.Iterator
Dim hitCurrent As Hit
Dim foundDoc As Document
While hitIterator.MoveNext = True
hitCurrent = hitIterator.Current()
foundDoc = hitCurrent.GetDocument()
Console.WriteLine(" " + foundDoc.GetValues("name")(0))
End While
REM ------------- second search
testQry = qryParse.Parse("red NOT green")
hits = ixSearcher.Search(testQry)
Console.WriteLine("Query for red but not green found " +
hits.Length().ToString() + " hits")
hitIterator = hits.Iterator
While hitIterator.MoveNext = True
hitCurrent = hitIterator.Current()
foundDoc = hitCurrent.GetDocument()
Console.WriteLine(" " + foundDoc.GetValues("name")(0))
End While
REM ------------- third search
testQry = qryParse.Parse("red OR blue OR magenta")
hits = ixSearcher.Search(testQry)
Console.WriteLine("Query for red or blue or magenta found " +
hits.Length().ToString() + " hits")
hitIterator = hits.Iterator
While hitIterator.MoveNext = True
hitCurrent = hitIterator.Current()
foundDoc = hitCurrent.GetDocument()
Console.WriteLine(" " + foundDoc.GetValues("name")(0))
End While
ixSearcher.Close()
End Sub
End Module
- Neal
-----Original Message-----
From: tony njedeh [mailto:[email protected]]
Sent: Thursday, January 07, 2010 4:30 PM
To: [email protected]
Subject: RE: Question
Hi Neal,
I would like to see the examples you have, using Lucene.NET from VB ?
Njedeh
--- On Thu, 1/7/10, Granroth, Neal V. <[email protected]> wrote:
From: Granroth, Neal V. <[email protected]>
Subject: RE: Question
To: "[email protected]" <[email protected]>
Date: Thursday, January 7, 2010, 3:05 PM
IFilter is a Microsoft COM interface implemented by components that extract
searchable content from a specific document format (Word, PDF, etc.) Lucene.NET
does not use these components directly, they are used by whatever software you
construct to populate the Lucene index with searchable content.
There is a lot of information on IFilter on Microsoft's site; and I think their
optional use is beyond the scope of the Lucene.NET project.
Would it help if I put together some simple examples of using Lucene.NET from
VB ?
- Neal
-----Original Message-----
From: Ed Jones [mailto:[email protected]]
Sent: Thursday, January 07, 2010 1:39 PM
To: [email protected]
Subject: RE: Question
Remember that not everyone uses c#, many people use VB.net and although it's
relatively simple to move it over to c#, moving from c# to Java is just one
extra step where things can go wrong.
At the time (3 years ago) I offered to spend time trying to make a set of
examples such as how to use iFilters (I think that was the term) but nobody was
interested so my attention moved elsewhere.
-----Original Message-----
From: Granroth, Neal V. [mailto:[email protected]]
Sent: 07 January 2010 19:37
To: [email protected]
Subject: RE: Question
I am very surprised by this comment.
There is so much similarity between Java and C# that I found absolutely no
difficulty with the discussion and examples in "Lucene in Action" and in
directly applying the techniques to my C#/.NET projects.
Maybe it would be helpful for some of those who find the java examples
confusing to explain specifically why they are confusing. Then we might
consider putting together some type of short "Guide to understanding Lucene for
C# developers" or FAQ on the web site.
- Neal
-----Original Message-----
From: Ed Jones [mailto:[email protected]]
Sent: Thursday, January 07, 2010 3:57 AM
To: [email protected]
Subject: RE: Question
All I can say is that we found the lack of examples for .net problematic as
when you are not too up to speed with Java there are a lot of basic hurdlers to
overcome.
-----Original Message-----
From: Olivier Spinelli [mailto:[email protected]]
Sent: 07 January 2010 09:55
To: [email protected]
Subject: RE: Question
<quote>
Lucene.Net sticks to the APIs and classes used in the original Java
implementation of Lucene. The API names as well as class names are preserved
with the intention of giving Lucene.Net the look and feel of the C# language
and the .NET Framework. For example, the method Hits.length() in the Java
implementation now reads Hits.Length() in the C# port.
In addition to the APIs and classes port to C#, the algorithm of Java Lucene
is ported to C# Lucene. This means an index created with Java Lucene is
back-and-forth compatible with the C# Lucene; both at reading, writing and
updating. In fact a Lucene index can be concurrently searched and updated
using Java Lucene and C# Lucene processes.
</quote>
It's merely all about switching from camelCase to PascalCase...
HTH
Spi
-----Message d'origine-----
De : Ed Jones [mailto:[email protected]]
Envoyé : jeudi 7 janvier 2010 10:27
À : [email protected]
Objet : RE: Question
My problem with Lucene in Action and all the examples on the internet is
that they were all in Java and you have to understand exactly what Java
is doing to understand it all properly. It's for this very reason we had
to shun using Lucene.net in major projects. I wanted dearly to use it
but the learning curve was far too steep and there appears to be very
very few .net examples of code or help.
Instead we have invested a significant amount of money in buying in a
much more commercial search engine.
I am keeping an eye on the Lucene.net project though in-case it can be
used in other parts of our business, but again the same will apply, we
will need more non Java examples.
Ed
-----Original Message-----
From: Roger Chapman [mailto:[email protected]]
Sent: 07 January 2010 09:21
To: [email protected]
Subject: RE: Question
>From what I can remember the book Lucene in Action has a good section on
indexing documents and PDFs http://www.manning.com/hatcher2/
Roger.
-----Original Message-----
From: Ben Martz [mailto:[email protected]]
Sent: 06 January 2010 19:51
To: [email protected]
Cc: <[email protected]>
Subject: Re: Question
Todd,
I would definitely take Michael's advice to learn more about the
overall issue before you get too far.
A quick answer that may help is Windows does not ship with an iFilter
for PDF built-in. Installing Adobe Reader 8 or higher will install a
decent PDF iFilter.
I am a little surprised by your question though - I assume that you
have access to your own source code and could examine the result from
the iFilter that's being fed to the IndexWriter and compare the
behavior in the TXT case with the behavior in the PDF case?
Cheers,
Ben
Sent from my iPhone
On Jan 6, 2010, at 10:13, Michael Garski <[email protected]>
wrote:
> Todd,
>
> You'll need some way to extract the text from the PDF prior to
> indexing. I'm not familiar with any packages that can do that but I
> have heard of them. You may want to try searching the mailing list
> to see if there has been mention of one previously. Lucid
> Imagination hosts a great mailing list search tool at
http://www.lucidimagination.com/search/
>
> Michael
>
> -----Original Message-----
> From: Todd McIndoo [mailto:[email protected]]
> Sent: Wednesday, January 06, 2010 10:11 AM
> To: [email protected]
> Subject: Question
>
> Sorry if this is duplicate
>
>
>
> We are using Lucene.net of version 2.0.0.4. I am trying to search a
> document
> which contains lots of PDFs. I want to search a document, which
> contains a
> specific word, using Lucene.net. We are yielding results in text
> documents
> but not in PDF. Is there something we have to do to be able to
> search in PDF
>
> Documents. All ifilters have been installed on the computer so I do
> not
> think that is the issue.
>
>
>
> Regards,
>
> SPEEDY SOLUTIONS
>
>
>
> Todd McIndoo
>