How can I merge .cfx and .cfs into a single cfs file?

2010-05-05 Thread 张志田
Hi all, I have an index task which will index thousands of records with lucene 3.0.1. My confusion is lucene will always create a .cfx and a .cfs file in the file system, sometimes more, while I thought it should create a single .cfs file if I optimize the index data. Is it by design? If yes,

AW: How can I merge .cfx and .cfs into a single cfs file?

2010-05-05 Thread Uwe Goetzke
Index all into a directory and determine the size of all files in it. From http://lucene.apache.org/java/3_0_1/fileformats.html Starting with Lucene 2.3, doc store files (stored field values and term vectors) can be shared in a single set of files for more than one segment. When compound file

Re: Using IndexReader in the web environment

2010-05-05 Thread Ian Lea
You could tell the searching part of your app, via some notification or messaging call. Or call IndexReader.isCurrent() from time to time, or even on every search, and reopen() if necessary. See the javadocs and don't forget to close the old reader when you do call reopen. -- Ian. On Wed,

Re: How can I merge .cfx and .cfs into a single cfs file?

2010-05-05 Thread Michael McCandless
Lucene considers an index with a single .cfx and a single .cfs as optimized. Also, note that how Lucene stores files in the index is an impl detail -- it can change from release to release -- so relying on any of these details is dangerous. That said, with recent Lucene versions, if you really

Re: Relevancy Practices

2010-05-05 Thread Grant Ingersoll
On May 2, 2010, at 5:50 AM, Avi Rosenschein wrote: On 4/30/10, Grant Ingersoll gsing...@apache.org wrote: On Apr 30, 2010, at 8:00 AM, Avi Rosenschein wrote: Also, tuning the algorithms to the users can be very important. For instance, we have found that in a basic search functionality,

Re: Relevancy Practices

2010-05-05 Thread Grant Ingersoll
Thanks, Peter. Can you share what kind of evaluations you did to determine that the end user believed the results were equally relevant? How formal was that process? -Grant On May 3, 2010, at 11:08 AM, Peter Keegan wrote: We discovered very soon after going to production that Lucene's

Re: Relevancy Practices

2010-05-05 Thread Peter Keegan
The feedback came directly from customers and customer facing support folks. Here is an example of a query with keywords: nurse, rn, nursing, hospital. The top 2 hits have scores of 26.86348 and 26.407215. To the customer, both results were equally relevant because all of their keywords were in

Re: Relevancy Practices

2010-05-05 Thread Avi Rosenschein
On Wed, May 5, 2010 at 5:08 PM, Grant Ingersoll gsing...@apache.org wrote: On May 2, 2010, at 5:50 AM, Avi Rosenschein wrote: On 4/30/10, Grant Ingersoll gsing...@apache.org wrote: On Apr 30, 2010, at 8:00 AM, Avi Rosenschein wrote: Also, tuning the algorithms to the users can be very

Re: problem in Lucene's ranking function

2010-05-05 Thread José Ramón Pérez Agüera
Hi Robert, thank you very much for your quick response, I have a couple of questions, did you read the papers that I mention in my e-mail? do you think that Lucene ranking function could have this problem? My concern is not about how to implement different kind of ranking functions for Lucene,

Re: problem in Lucene's ranking function

2010-05-05 Thread Robert Muir
2010/5/5 José Ramón Pérez Agüera jose.agu...@gmail.com Hi Robert, thank you very much for your quick response, I have a couple of questions, did you read the papers that I mention in my e-mail? Yes. do you think that Lucene ranking function could have this problem? I know it does.

Re: problem in Lucene's ranking function

2010-05-05 Thread José Ramón Pérez Agüera
Hi Robert, the problem is not the linear combination of fields, the problem is to apply the boost factor per field after the term frequency saturation function and then make the linear combination of fields. Every system that implement BM25F, including terrier, take care of that, because if you

Re: problem in Lucene's ranking function

2010-05-05 Thread Robert Muir
2010/5/5 José Ramón Pérez Agüera jose.agu...@gmail.com Hi Robert, the problem is not the linear combination of fields, the problem is to apply the boost factor per field after the term frequency saturation function and then make the linear combination of fields. Every system that implement

Re: problem in Lucene's ranking function

2010-05-05 Thread José Ramón Pérez Agüera
Hi Robert, I will be very happy to see this problem fixed :-) I can not image what reasons people have to use software with bugs, I guess that others bugs in lucene are removed. Anyway, if finally you are going to fix the problem, these are good news :-) thank you very much for your time. jose

Re: problem in Lucene's ranking function

2010-05-05 Thread Yonik Seeley
2010/5/5 José Ramón Pérez Agüera jose.agu...@gmail.com: [...] The consequence is that a document matching a single query term over several fields could score much higher than a document matching several query terms in one field only, One partial workaround that people use is

Re: How can I merge .cfx and .cfs into a single cfs file?

2010-05-05 Thread 张志田
Thank you Mike. Garry - Original Message - From: Michael McCandless luc...@mikemccandless.com To: java-user@lucene.apache.org Sent: Wednesday, May 05, 2010 8:24 PM Subject: Re: How can I merge .cfx and .cfs into a single cfs file? Lucene considers an index with a single .cfx and a

Re: Using IndexReader in the web environment

2010-05-05 Thread Ivan Liu
You may look this: private static IndexSearcher indexSearcher = null; public synchronized IndexSearcher newIndexSearcher() { try { if (null == indexSearcher) { Directory directory = FSDirectory.open(new File(Config.DB_DIR+/rssindex)); indexSearcher = new