Wildcard queries don't work on untokenized fields (Lucene 2.2.0)

2007-07-12 Thread Michael Böckling
Hi! I just discovered that it is not possible to search in untokenized fields when using a wildcard query. The query code:KP00* becomes code:kp00* in its parsed form when it should really be code:KP00*, as tested in Luke using a whitespace analyzer. When omitting the wildcard character *, the

RE: Calling indexWriter.close() in web app

2007-07-12 Thread Ard Schrijvers
Hello, The lock file is only for Writers. The lock file ensures that even two writers from two JVM's will not step on each other. Readers do not care about what the writers are doing or whether there is a lock file... Is this always true? The deleteDocuments method of the IndexReader

Re: Calling indexWriter.close() in web app

2007-07-12 Thread Mark Miller
Sorry I was not clear on this. I just meant that you will not have any trouble opening a Reader on an index with a lock file. You may have trouble deleting with that Reader g When you are using a Reader as a Writing Reader i.e. deleting, then you pretty much have to consider it as a Writer. I

Re: Wildcard queries don't work on untokenized fields (Lucene 2.2.0)

2007-07-12 Thread Mark Miller
On the QueryParser: /** * Whether terms of wildcard, prefix, fuzzy and range queries are to be automatically * lower-cased or not. Default is codetrue/code. */ public void setLowercaseExpandedTerms(boolean lowercaseExpandedTerms) { this.lowercaseExpandedTerms =

Re: Payloads and PhraseQuery

2007-07-12 Thread Peter Keegan
I'm looking for Spans.getPositions(), as shown in BoostingTermQuery, but neither NearSpansOrdered nor NearSpansUnordered (which are the Spans provided by SpanNearQuery) provide this method and it's not clear to me how to add it. Peter On 7/11/07, Chris Hostetter [EMAIL PROTECTED] wrote: :

Re: Payloads and PhraseQuery

2007-07-12 Thread Grant Ingersoll
That is off of the TermSpans class. BTQ (BoostingTermQuery) is implemented to extend SpanQuery, thus SpanNearQuery isn't, w/o modification, going to have access to these things. However, if you look at the SpanTermQuery, you will see that it's implementation of Spans is indeed the

checking existing docs before indexing

2007-07-12 Thread Heba Farouk
Hello i'm a newbie to lucene world and i hope that u help me. i was asking is there any options in IndexWriter to check if a document already exsits before adding it to the index or i should maintain it manually ?? thanks in advance Yours Heba -

replace values in index

2007-07-12 Thread Jeff
I have documents with lots of text. Part of the text is in the following format: word1,word2,word3,word4,word5 I am currently using the StandardAnalyzer and everything is working great with the other data, except I can't query for 'word3' as a ',' isn't a token seperator. Is there an easy way

Re: replace values in index

2007-07-12 Thread Mark Miller
While it is possible to alter the StandardAnalyzer, depending on more details of your source text, it may be better to use a different analyzer or make your own. The StandardAnalyzer is quite slow if you do not need all of its features, and modifying it will make it harder to keep up with bug

Re: checking existing docs before indexing

2007-07-12 Thread Erick Erickson
You have to check yourself. Lucene has no concept of relations *between* documents. What you're really asking for is something like a database unique key. No such luck, you have to create one yourself. What I've done is post-process the entire index, removing duplicates. This can be done quite

Re: checking existing docs before indexing

2007-07-12 Thread Neeraj Gupta
Hi, You an use updateDocument() method of IndexWriter to update any existing document.. It searches for a document matching the Term, if document existes then delete that document. After that it adds the provided document to the indexes in both the cases whether document exists or not.

Re: checking existing docs before indexing

2007-07-12 Thread Samuel LEMOINE
Neeraj Gupta a écrit : Hi, You an use updateDocument() method of IndexWriter to update any existing document.. It searches for a document matching the Term, if document existes then delete that document. After that it adds the provided document to the indexes in both the cases whether

How to reflect index changes to search automatically

2007-07-12 Thread Sonu SR
Hi, I have SearchServer and SearchClient programs. The SearchServer using RemoteSearchable for binding the indices in servers . The SearchClient using ParallelMultiSearcher for searching the indices. The problem is that I have to restart the search servers for reflecting the index change in

Re: How to reflect index changes to search automatically

2007-07-12 Thread Erick Erickson
In general, searchers cannot see changes to an index with out restarting, so I suspect that the answer is no. This is entirely independent of remote, parallel, etc. Erick On 7/12/07, Sonu SR [EMAIL PROTECTED] wrote: Hi, I have SearchServer and SearchClient programs. The SearchServer

RE: Unable to set up CLASSPATH

2007-07-12 Thread Yom Chouloute
I am trying to get Lucene installed on a redhat server but I have having issues setting up the classpath Here are my steps: I downloaded the zip files from the lucene web site I extracted them to a folder titled : /var/www/html/lucene I ran ant war-demo which was able to create a build

Re: Payloads and PhraseQuery

2007-07-12 Thread Paul Elschot
On Thursday 12 July 2007 14:50, Grant Ingersoll wrote: That is off of the TermSpans class. BTQ (BoostingTermQuery) is implemented to extend SpanQuery, thus SpanNearQuery isn't, w/o modification, going to have access to these things. However, if you look at the SpanTermQuery, you will

Customizing Stop Word List?

2007-07-12 Thread Michael Barbarelli
Hello to All, I'm having a problem with Lucene where certain words that I would like to be included in the query are actually being ommitted from it. And I think that is because Lucene recognizes them as stop words. This is the case with roughly four terms in particular. They look like

Re: How to reflect index changes to search automatically

2007-07-12 Thread jafarim
With local indices, it is enough to reopen the IndexSearcher by calling close() and then renew the IndexSearcher object. How about RemoteSearchers? Is it necessary to re-initialize remote search server? --jaf On 7/12/07, Erick Erickson [EMAIL PROTECTED] wrote: In general, searchers cannot

Re: Payloads and PhraseQuery

2007-07-12 Thread Grant Ingersoll
Yep, totally agree.One way to handle this initially at least is have isPayloadAvailable() only return true for the SpanTermQuery. The other option is to come up with some modification of the suggested methods below to return all the payloads in a span. I have a basic implementation

Does Index have a Tokenizer Built into it

2007-07-12 Thread John Paul Sondag
Hi, When Lucene's standard Indexer is used to store documents does it store the information about the tokens in anyway. I'm playing around with making a Snippet Generator (like the highlighter class), and it is going to involve a very large amount of documents. For my test cases I have only

Re: Customizing Stop Word List?

2007-07-12 Thread Chris Hostetter
: So far, I have attempted to fix this problem by defining my own list of stop : words and passing that array onto a standard analyzer used for both indexing : and searching. That didn't work. Would a per-field analyzer work in this that is the correct way to change your stop word set ...

Re: Customizing Stop Word List?

2007-07-12 Thread Michael Barbarelli
Hello Hoss. Cheers for your response. Much appreciated. typically the act of writing this sample code helps you spot where you amy be doing something wrong in your application Fair enough point. Unfortunately, I won't be able to post any sample code until I return to my home office. Will

Re: Payloads and PhraseQuery

2007-07-12 Thread Peter Keegan
Grant, If/when you have an implementation for SpanNearQuery, I'd be happy to test it. Peter On 7/12/07, Grant Ingersoll [EMAIL PROTECTED] wrote: Yep, totally agree.One way to handle this initially at least is have isPayloadAvailable() only return true for the SpanTermQuery. The other

Re: Payloads and PhraseQuery

2007-07-12 Thread Chris Hostetter
: That is off of the TermSpans class. BTQ (BoostingTermQuery) is ... : I am not completely sure here, but it seems like we may need an : efficient way to access the TermPositions for each document. That : is, the Spans class doesn't provide this and maybe it should ... : I'm

Re: Lucene RAM Directory doesn't work for Index Size 8 GB

2007-07-12 Thread Doron Cohen
hi Murali, I found a casting issue that can cause this problem - see patch in http://issues.apache.org/jira/browse/LUCENE-957 is the problem solved with this patch? Doron muraalee [EMAIL PROTECTED] wrote on 09/07/2007 14:15:04: Hi, We are facing a strange problem with RAMDirectory for

Re: Payloads and PhraseQuery

2007-07-12 Thread Grant Ingersoll
On Jul 12, 2007, at 6:12 PM, Chris Hostetter wrote: Hmm... okay so the issue is that in order to get the payload data, you have to have a TermPositions instance. instead of adding getPayload methods to the Spans class (which as Paul points out, can have nesting issues) perhaps more general