Re: Call Lucene default command line Search from PHP script

2008-03-25 Thread Mathieu Lecarme
milu07 a écrit : Hello, My machine is Ubuntu 7.10. I am working with Apache Lucene. I have done with indexer and tried with command line Searcher (the default command line included in Lucene package: http://lucene.apache.org/java/2_3_1/demo2.html). When I use this at command line: java

RE: Field values ...

2008-03-25 Thread Dragon Fly
Thanks. Date: Mon, 24 Mar 2008 21:03:13 -0700 From: [EMAIL PROTECTED] To: java-user@lucene.apache.org Subject: RE: Field values ... : The Id and Phone fields are stored. So I can just do a MatchAllQuery as : you suggested. I have read about field selectors on this mailing list :

Improving Index Search Performance

2008-03-25 Thread Shailendra Mudgal
Hi Everyone, We are using Lucene to search on a index of around 20G size with around 3 million documents. We are facing performance issues loading large results from the index. Based on the various posts on the forum and documentation, we have made the following code changes to improve the

Integrating Spell Checker contributed to Lucene

2008-03-25 Thread Ivan Vasilev
Hi Guys, Has anybody integrated the Spell Checker contributed to Lucene. I need advise from where to get free dictionary file (one that contains all words in English) that could be used to create instance of PlainTextDictionary class. I currently use for my tests responding files from Jazzy

Re: Integrating Spell Checker contributed to Lucene

2008-03-25 Thread Mathieu Lecarme
Ivan Vasilev a écrit : Hi Guys, Has anybody integrated the Spell Checker contributed to Lucene. http://blog.garambrogne.net/index.php?post/2008/03/07/A-lexicon-approach-for-Lucene-index https://issues.apache.org/jira/browse/LUCENE-1190 I need advise from where to get free dictionary file

hitcollector topdocs

2008-03-25 Thread JensBurkhardt
Hi everybody, I was searching for informations about the hitcollector. I was wondering if the value of the fields have to be stored or not. i tested it and it worked both but i'm still not really sure about it. Second question is, can i work with tokenized fields? Best regards Jens -- View

Re: Improving Index Search Performance

2008-03-25 Thread Toke Eskildsen
On Tue, 2008-03-25 at 18:13 +0530, Shailendra Mudgal wrote: We are using Lucene to search on a index of around 20G size with around 3 million documents. We are facing performance issues loading large results from the index. [...] After all these changes, it seems to be taking around 90 secs

Re: explain() - fieldnorm

2008-03-25 Thread JensBurkhardt
another problem just occurred. These are the results from explain() : 0.27576536 = (MATCH) product of: 0.827296 = (MATCH) sum of: 0.827296 = (MATCH) sum of: 0.24544832 = (MATCH) weight(ti:genetik in 1849319), product of: 0.015469407 = queryWeight(ti:genetik), product of:

Re: feedback: Indexing speed improvement lucene 2.2-2.3.1

2008-03-25 Thread Jake Mannix
Uwe, This is a little off thread-topic, but I was wondering how your search relevance and search performance has fared with this bigram-based index. Is it significantly better than before you use the NGramAnalyzer? -jake On 3/24/08, Uwe Goetzke [EMAIL PROTECTED] wrote: Hi Ivan, No, we

AW: feedback: Indexing speed improvement lucene 2.2-2.3.1

2008-03-25 Thread Uwe Goetzke
Jake, With the bigram-based index we gave up for the struggle to find a well working language based index. We had implemented soundex (or different sound-alikes) and hyphenating but failed to deliver a user explainable search result (why is this ranked higher and so on...). One reason may be

random accessing term value

2008-03-25 Thread John Wang
Hi: Is there a way to random accessing term value in a field? e.g. in my field, content, the terms are: lucene, is, cool Is there a way to access content[2] - cool? Thanks -John

Re: AW: feedback: Indexing speed improvement lucene 2.2-2.3.1

2008-03-25 Thread Jay
Hi Uwe, I am curious what NGramStemFilter is? Is it a combination of porter stemming and word ngram identification? Thanks! Jay Uwe Goetzke wrote: Hi Ivan, No, we do not use StandardAnalyser or StandardTokenizer. Most data is processed by fTextTokenStream = result = new

Re: Improving Index Search Performance

2008-03-25 Thread Paul Elschot
Shailendra, Have a look at the javadocs of HitCollector: http://lucene.apache.org/java/2_3_0/api/core/org/apache/lucene/search/HitCollector.html The problem is with the use of the disk head, when retrieving the documents during collecting, the disk head has to move between the inverted index and

Re: Improving Index Search Performance

2008-03-25 Thread Chris Hostetter
: *We also read in one of the posts that we should use bitSet.set(doc) : instead of calling searcher.doc(id). But we are unable to to understand how : this might help in our case since we will anyway have to load the document : to get the other required field(company_id). Also we observed that

Re: AW: feedback: Indexing speed improvement lucene 2.2-2.3.1

2008-03-25 Thread Otis Gospodnetic
Jay, Have a look at Lucene config, it's all there, including tests. This filter will take a token such as foobar and chop it up into n-grams (e.g. foobar - fo oo ob ba ar would be a set of bi-grams). You can specify the n-gram size and even min and max n-gram size. Otis -- Sematext --

Re: AW: feedback: Indexing speed improvement lucene 2.2-2.3.1

2008-03-25 Thread Jay
Sorry, I could not find the filter in the 2.3 API class list (core + contrib + test). I am not ware of lucene config file either. Could you please tell me where it is in 2.3 release? Thanks! Jay Otis Gospodnetic wrote: Jay, Have a look at Lucene config, it's all there, including tests.

Re: hitcollector topdocs

2008-03-25 Thread Grant Ingersoll
Hi Jens, I'm having a bit of a hard time following this, so perhaps you could rephrase, or show your sample code or explain a bit more about what you are trying to do at a higher level? Cheers, Grant On Mar 25, 2008, at 10:46 AM, JensBurkhardt wrote: Hi everybody, I was searching for

Re: explain() - fieldnorm

2008-03-25 Thread Grant Ingersoll
On Mar 25, 2008, at 12:10 PM, JensBurkhardt wrote: As you can see, both are exactly the same. The thing i don't understand is, that the two documents have different documentboosts (the first one got an boost of 1.62 , the second of 1.65) - the boosts are different because the two books

Re: AW: feedback: Indexing speed improvement lucene 2.2-2.3.1

2008-03-25 Thread Otis Gospodnetic
Hi Jay, Sorry, lapsus calami, that would be Lucene *contrib*. Have a look: http://lucene.apache.org/java/2_3_1/api/contrib-analyzers/index.html Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Jay [EMAIL PROTECTED] To:

Re: AW: feedback: Indexing speed improvement lucene 2.2-2.3.1

2008-03-25 Thread yu
Hi Otis, I checked that contrib before and could not find NgramStemFilter. Am I missing other contrib? Thanks for the link! Jay Otis Gospodnetic wrote: Hi Jay, Sorry, lapsus calami, that would be Lucene *contrib*. Have a look:

Re: AW: feedback: Indexing speed improvement lucene 2.2-2.3.1

2008-03-25 Thread Otis Gospodnetic
Sorry, I wrote this stuff, but forgot the naming. Look: http://lucene.apache.org/java/2_3_1/api/contrib-analyzers/org/apache/lucene/analysis/ngram/package-summary.html Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: yu [EMAIL PROTECTED] To:

Re: AW: feedback: Indexing speed improvement lucene 2.2-2.3.1

2008-03-25 Thread yu
Sorry for my ignorance, I am looking for NgramStemFilter specifically. Are you suggesting that it's the same as NGramTokenFilter? Does it have stemming in it? Thanks again. Jay Otis Gospodnetic wrote: Sorry, I wrote this stuff, but forgot the naming. Look:

Re: random accessing term value

2008-03-25 Thread John Wang
I am not sure how term vectors would help me. Term vectors are ordered by frequency, not in lex order. Since I know in the dictionary the terms are ordered by lex, seems it is possible for me to randomly get the nth term in the dictionary without having to seek to it. Thoughts? Thanks -John On