sub search

2006-03-07 Thread Anton Potehin
Is it possible to make search among results of previous search?   For example: I made search: Searcher searcher =... Query query = ... Hits hits = hits = Searcher.search(query);   After it I want to not make a new search, I want to make search among found

Re: sub search

2006-03-07 Thread hu andy
2006/3/7, Anton Potehin [EMAIL PROTECTED]: Is it possible to make search among results of previous search? For example: I made search: Searcher searcher =... Query query = ... Hits hits = hits = Searcher.search(query); After it I want to not make a new search,

Re: Distributed Lucene..

2006-03-07 Thread Andrzej Bialecki
Prasenjit Mukherjee wrote: I think nutch has a distributed lucene implementation. I could have used nutch straightaway, but I have a different crawler, and also dont want to use NDFS(which is used by nutch) . What I have proposed earlier is basically based on mapReduce paradigm, which is used

RE: sub search

2006-03-07 Thread anton
As far as I understood that will make new search throughout the index. But what the difference between that and search described below: TermQuery termQuery = new TermQuery( BooleanQuery bq = .. bq.add(termQuery,true,false); bq.add(query,true,false); hits = Searcher.search(bq,queryFilter);

about lucene 1.9

2006-03-07 Thread Haritha_Parvatham
Hi, I have downloaded the latest release lucene 1.9.I have deployed in tomcat. When i search from the front end.It gives me the message.Please tell me how to use lucene 1.9 . Welcome to the Lucene Template application. (This is the header) Document Summary null

Writing terms/freq pairs directly to the inverted file

2006-03-07 Thread Murat Yakici
Hi, I would like to by-pass the IndexWriter and directly write the terms and their frequencies to the index (and may proximity info later on). I might have missed any discussion if previously. As far as I know, the high level API in Lucene only allows you to add documents (which are populated

Re: MultiPhraseQuery

2006-03-07 Thread Erik Hatcher
On Mar 7, 2006, at 2:35 AM, Eric Jain wrote: Daniel Naber wrote: Please try to add this to MultiPhraseQuery and let us know if it helps: public List getTerms() { return termArrays; } That is indeed all I need (the list wouldn't have to be mutable though). Any chance this could be

Re: sub search

2006-03-07 Thread hu andy
It uses cache mechanism. The detail is described in the book Lucene in Action. Maybe you can test it to decide which is faster 2006/3/7, [EMAIL PROTECTED] [EMAIL PROTECTED]: As far as I understood that will make new search throughout the index. But what the difference between that and search

Question

2006-03-07 Thread Thomas Papke
Hello, anyone implement the Google Suggest Feature using Lucene? The Frontend is clear - but i need a very fast way to retrieve matching terms. For example: The user typed Ab and i want to give him a list of 10 possible words in term name starting with Ab*. So i don't need the hole document

RE: Question

2006-03-07 Thread Pasha Bizhan
Hi, From: Thomas Papke [mailto:[EMAIL PROTECTED] anyone implement the Google Suggest Feature using Lucene? The Frontend is clear - but i need a very fast way to retrieve matching terms. For example: The user typed Ab and i want to give him a list of 10 possible words in term name

RE: Question

2006-03-07 Thread Pasha Bizhan
Hi, From: Thomas Papke [mailto:[EMAIL PROTECTED] anyone implement the Google Suggest Feature using Lucene? The Frontend is clear - but i need a very fast way to retrieve matching terms. For example: The user typed Ab and i want to give him a list of 10 possible words in term name

Lucene version 1.9

2006-03-07 Thread WATHELET Thomas
I've created an index with the Lucene version 1.9 and when I try to open this index I have always this error mesage: java.lang.ArrayIndexOutOfBoundsException. if I use an index built with the lucene version 1.4.3 it's working. Wath's wrong?

RE: Question

2006-03-07 Thread Pasha Bizhan
Hi, From: Leon Chaddock [mailto:[EMAIL PROTECTED] I am very interested in this aswell, as I wish to display related searches for users. What does related mean? Does anyone know if this work is open source and is there an api available? Ask David or use web.archive:

Re: Question

2006-03-07 Thread Jeff Rodenburg
We've done this, and it's not that complex. (Sorry, client won't allow me to release the code.) It's AJAX on the front end, so that background call is simply executing a search against an index that consists of the aggregated search terms. We do wildcard queries to get the results we want. For

Re: sub search

2006-03-07 Thread Erik Hatcher
On Mar 7, 2006, at 7:03 AM, hu andy wrote: It uses cache mechanism. The detail is described in the book Lucene in Action. Maybe you can test it to decide which is faster Major caveat here is that the caching QueryFilter employs really only works if you use the same instance of QueryFilter

Re: Get only count

2006-03-07 Thread Eric Jain
Anton Potehin wrote: Now I create new search for get number of results. For example: IndexSearcher is = ... Query q = ... numberOfResults = Is.search(q).length(); Can I accelerate this example ? And how ? Perhaps something like: class CountingHitCollector implements HitCollector {

Re: sub search

2006-03-07 Thread Eric Jain
Anton Potehin wrote: After it I want to not make a new search, I want to make search among found results... Perhaps something like this would work: final BitSet results = toBitSet(Hits); searcher.search(newQuery, new Filter() { public BitSet bits(IndexReader reader) { return results;

Unreported IOException received for SpanTermQuery class

2006-03-07 Thread Murat Yakici
Hi, I was building the Lucene 1.9.1 source code. I have received the following error msg: Unreported exceptions: java.io.IOException must be caught or declared to be thrown. in class SpanOrQuery, line number 154. Any ideas how to resolve it? Regards, Murat

RE: Get only count

2006-03-07 Thread anton
While you added if (score 0.0f). Javadoc contain lines HitCollector.collect(int,float) is called for every non-zero scoring. -Original Message- From: Eric Jain [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 07, 2006 5:08 PM To: java-user@lucene.apache.org Subject: Re: Get only count

RE: Get only count

2006-03-07 Thread anton
While you added if (score 0.0f). Javadoc contain lines HitCollector.collect(int,float) is called for every non-zero scoring. -Original Message- From: Eric Jain [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 07, 2006 5:08 PM To: java-user@lucene.apache.org Subject: Re: Get only count

Re: Lucene version 1.9

2006-03-07 Thread Paul Elschot
Thomas, On Tuesday 07 March 2006 13:57, WATHELET Thomas wrote: I've created an index with the Lucene version 1.9 and when I try to open this index I have always this error mesage: java.lang.ArrayIndexOutOfBoundsException. if I use an index built with the lucene version 1.4.3 it's working.

Re: Get only count

2006-03-07 Thread Yonik Seeley
On 3/7/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: While you added if (score 0.0f). Javadoc contain lines HitCollector.collect(int,float) is called for every non-zero scoring. That should probably read is called for every matching document. -Yonik

Re: Unreported IOException received for SpanTermQuery class

2006-03-07 Thread Paul Elschot
On Tuesday 07 March 2006 15:35, Murat Yakici wrote: Hi, I was building the Lucene 1.9.1 source code. I have received the following error msg: Unreported exceptions: java.io.IOException must be caught or declared to be thrown. in class SpanOrQuery, line number 154. Any ideas how to

RE: Get only count

2006-03-07 Thread anton
Can have matching document score equals zero ? -Original Message- From: Yonik Seeley [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 07, 2006 6:20 PM To: java-user@lucene.apache.org Subject: Re: Get only count Importance: High On 3/7/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: While

RE: Get only count

2006-03-07 Thread anton
Can have matching document score equals zero ? -Original Message- From: Yonik Seeley [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 07, 2006 6:20 PM To: java-user@lucene.apache.org Subject: Re: Get only count Importance: High On 3/7/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: While

indexing problems

2006-03-07 Thread Apache Lucene
Hi, I am using Lucene 1.9.1 to index the files. The index writer created the following files (1) segment file segments (2) deletable file deletable (3) compound file cfs None of the other files like term info, frequency..etc were created. Is there something obvious, I am doing wrong?

Re: Unreported IOException received for SpanTermQuery class

2006-03-07 Thread Murat Yakici
The compiler is Sun Java 1.4.2_08. Paul Elschot wrote: On Tuesday 07 March 2006 15:35, Murat Yakici wrote: Hi, I was building the Lucene 1.9.1 source code. I have received the following error msg: Unreported exceptions: java.io.IOException must be caught or declared to be thrown. in

Re: indexing problems

2006-03-07 Thread Yonik Seeley
You are using the compound file format (the default since 1.4) and the .cfs file contains all those individual parts. -Yonik On 3/7/06, Apache Lucene [EMAIL PROTECTED] wrote: Hi, I am using Lucene 1.9.1 to index the files. The index writer created the following files (1) segment file

Re: Get only count

2006-03-07 Thread Yonik Seeley
On 3/7/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Can have matching document score equals zero ? Yes. Scorers don't generally use score to determine if a document matched the query. Scores = 0.0f are currently screened out at the top level search functions, but not when you use a

Re: indexing problems

2006-03-07 Thread Apache Lucene
Is it advisable to use compound file format? or should I revert it back to simple file format? How do I revert it back? thanks, lucenenator On 3/7/06, Yonik Seeley [EMAIL PROTECTED] wrote: You are using the compound file format (the default since 1.4) and the .cfs file contains all those

Re: Unreported IOException received for SpanTermQuery class

2006-03-07 Thread Paul Elschot
On Tuesday 07 March 2006 16:34, Murat Yakici wrote: The compiler is Sun Java 1.4.2_08. I'm using sun javac 1.5.0_01 and this compiles the current trunk without any problems, so I cannot reproduce the error msg. The common-build.xml file uses source and target 1.4 for javac, (in the compile

Re: indexing problems

2006-03-07 Thread Erik Hatcher
On Mar 7, 2006, at 10:41 AM, Apache Lucene wrote: Is it advisable to use compound file format? or should I revert it back to simple file format? How do I revert it back? There is a setter on IndexWriter to set it back if you like. The compound format avoids the issues that cropped up a

Re: Unreported IOException received for SpanTermQuery class

2006-03-07 Thread Murat Yakici
Yeah, I know, sorry for that. The reason is, first I tried to solve the problem by wrapping the line with a try-catch block. Then, the next build gave the same error for SpanTermQuery and some other classes. I will try to compile that on 1.5.0_01. Thanks, Murat Paul Elschot wrote: On

Classification / Change Scoring during search

2006-03-07 Thread Rainer Dollinger
Hello, I want to use Lucene to get similar documents based on a Boolean Query (similar metadata with OR clauses) and ratings of the user for already searched documents. I intend to implement a Naive Bayes classifier to categorize documents into liked/disliked classes and would do this by using a

Re: indexing problems

2006-03-07 Thread Apache Lucene
This line is throwing a null pointer exception for the index I created as I mentioned in my previous emails. searcher = new IndexSearcher(IndexReader.open(indexPath) ); Any ideas? I made sure the indexPath is a valid path. thanks, lucenenator On 3/7/06, Erik Hatcher [EMAIL PROTECTED] wrote:

RE: Using NOT queries inside parentheses

2006-03-07 Thread Satuluri, Venu_Madhav
Query at = new TermQuery(new Term(alwaysTrueField,true)); Query user = queryParser.parse(userInput); if (user instanceof BooleanQuery) { BooleanQuery bq = (BooleanQuery)user; if (! usableBooleanQuery(bq)) { bq.add(at, true, false); /* add 'always true' clause

Re: indexing problems

2006-03-07 Thread Apache Lucene
BTW, I could access that index using Luke. It works fine. On 3/7/06, Apache Lucene [EMAIL PROTECTED] wrote: This line is throwing a null pointer exception for the index I created as I mentioned in my previous emails. searcher = new IndexSearcher(IndexReader.open(indexPath) ); Any ideas?

Scoring with FunctionQueries?

2006-03-07 Thread Sebastian Marius Kirsch
Hello, I have been trying out Yonik's excellent FunctionQuery (from Solr), but am having some problems regarding the scoring of FunctionQueries in conjunction with other queries. I am currently researching a data fusion approach, where you have several separate scores for a document and combine

Re: Lucene version 1.9

2006-03-07 Thread Doug Cutting
WATHELET Thomas wrote: I've created an index with the Lucene version 1.9 and when I try to open this index I have always this error mesage: java.lang.ArrayIndexOutOfBoundsException. if I use an index built with the lucene version 1.4.3 it's working. Wath's wrong? Are you perhaps trying to open

Re: Throughput doesn't increase when using more concurrent threads

2006-03-07 Thread Peter Keegan
I ran a query performance tester against 8-cpu and 16-cpu Xeon servers (16/32 cpu hyperthreaded). on Linux. Here are the results: 8-cpu: 275 qps 16-cpu: 305 qps (the dual-core Opteron servers are still faster) Here is the stack trace of 8 of the 16 query threads during the test: at

Re: Throughput doesn't increase when using more concurrent threads

2006-03-07 Thread Doug Cutting
Peter Keegan wrote: I ran a query performance tester against 8-cpu and 16-cpu Xeon servers (16/32 cpu hyperthreaded). on Linux. Here are the results: 8-cpu: 275 qps 16-cpu: 305 qps (the dual-core Opteron servers are still faster) Here is the stack trace of 8 of the 16 query threads during the

Re: Distributed Lucene..

2006-03-07 Thread Otis Gospodnetic
Hi, Just curious about this: We hacked :-) IndexWriter of Lucene to start all segment names with a prefix unique for each small index part. Then, when adding it to the actual index, we simply copy the new segment to the folder with the other segments, and add it in such a way so that the

Weighted Terms Per Document

2006-03-07 Thread Matthew O'Connor
Hello, I'm using Lucene 1.9 to replace an in-house search engine where all of the documents to be searched are also created in-house. One of the features of the search engine is something called 'xtras' which are associated with the documents. I am wondering how best to model this feature using

Re: Using NOT queries inside parentheses

2006-03-07 Thread Daniel Noll
Satuluri, Venu_Madhav wrote: If you want this to work, the most elegant way I've found is to override the getBooleanQuery(Vector) method in QueryParser and insert a MatchAllDocsQuery into the boolean query if every clause is prohibited. Daniel I tried this, but it looks like the overridden

Lucene 1.9.1 and timeToString() apparent incompatibility with 1.4.3

2006-03-07 Thread Victor Negrin
I recently converted from Lucene 1.4.3 to 1.9.1 and in the processed replaced all deprecated classes with the new ones as recommended (for forward compatibility with Lucene 2.0). This however seems to introduce an incompatibilty when the new timeToString() and stringToTime() classes are used.

Re: sub search

2006-03-07 Thread Daniel Noll
Anton Potehin wrote: Is it possible to make search among results of previous search? removed all the double spacing After it I want to not make a new search, I want to make search among found results... Simple. Create a new BooleanQuery and put the original query into it, along with the

Re: BooleanQuery$TooManyClauses with 1.9.1 when Number RangeQuery

2006-03-07 Thread Youngho Cho
Hello, : I upgade to 1.9.1 and reindexing : I used NumberTool when I index the number. : : after upgrade I got following error when number range query. : with query The possibility of a TooManyClauses exception has always existed with RangeQuery and numbers, even when using

Re: BooleanQuery$TooManyClauses with 1.9.1 when Number RangeQuery

2006-03-07 Thread Chris Hostetter
: You mean Theoritically : RangeQuery should be forbidden because it always has potential time bomb ? : Should we comment it in javadoc ? In my opinion, the only reason to use RangeQuery is if you are dealing with very controlled ranges, where you know hte number of terms it will expand to is

Re: Lucene 1.9.1 and timeToString() apparent incompatibility with 1.4.3

2006-03-07 Thread Chris Hostetter
: timeToString() and stringToTime() classes are used. Using an index created : with 1.4.3 and searched with 1.9.1 I now receive the following errors: As the deprecation comment in DateField says... If you build a new index, use DateTools instead. For existing indices you can

Re: Lucene 1.9.1 and timeToString() apparent incompatibility with 1.4.3

2006-03-07 Thread George Washington
Thanks Chris for making it clear, I had read the comment but I had not understood that it implied incompatibility. But will the code be preserved in Lucene 2.0, in light of the comment contained in the Lucene 1.9.1 announcement ? QUOTE Applications must compile against 1.9 without deprecation

RE: Distributed Lucene..

2006-03-07 Thread Andrew Schetinin
Hi, Sure not. We created another IndexWriter class and modified its function addIndexes (if I remember the function name correctly) so it will not call to optimize at the end - that's all. Having unique segment names was necessary because the segment file name is used inside the file itself, and