milu07 a écrit :
Hello,
My machine is Ubuntu 7.10. I am working with Apache Lucene. I have done with
indexer and tried with command line Searcher (the default command line
included in Lucene package: http://lucene.apache.org/java/2_3_1/demo2.html).
When I use this at command line:
java
Thanks.
Date: Mon, 24 Mar 2008 21:03:13 -0700
From: [EMAIL PROTECTED]
To: java-user@lucene.apache.org
Subject: RE: Field values ...
: The Id and Phone fields are stored. So I can just do a MatchAllQuery as
: you suggested. I have read about field selectors on this mailing list
:
Hi Everyone,
We are using Lucene to search on a index of around 20G size with around 3
million documents. We are facing performance issues loading large results
from the index. Based on the various posts on the forum and documentation,
we have made the following code changes to improve the
Hi Guys,
Has anybody integrated the Spell Checker contributed to Lucene. I need
advise from where to get free dictionary file (one that contains all
words in English) that could be used to create instance of
PlainTextDictionary class. I currently use for my tests responding files
from Jazzy
Ivan Vasilev a écrit :
Hi Guys,
Has anybody integrated the Spell Checker contributed to Lucene.
http://blog.garambrogne.net/index.php?post/2008/03/07/A-lexicon-approach-for-Lucene-index
https://issues.apache.org/jira/browse/LUCENE-1190
I need advise from where to get free dictionary file
Hi everybody,
I was searching for informations about the hitcollector. I was wondering if
the value of the fields have to be stored or not. i tested it and it worked
both but i'm still not really sure about it.
Second question is, can i work with tokenized fields?
Best regards
Jens
--
View
On Tue, 2008-03-25 at 18:13 +0530, Shailendra Mudgal wrote:
We are using Lucene to search on a index of around 20G size with around 3
million documents. We are facing performance issues loading large results
from the index. [...]
After all these changes, it seems to be taking around 90 secs
another problem just occurred. These are the results from explain() :
0.27576536 = (MATCH) product of:
0.827296 = (MATCH) sum of:
0.827296 = (MATCH) sum of:
0.24544832 = (MATCH) weight(ti:genetik in 1849319), product of:
0.015469407 = queryWeight(ti:genetik), product of:
Uwe,
This is a little off thread-topic, but I was wondering how your
search relevance and search performance has fared with this
bigram-based index. Is it significantly better than before you use
the NGramAnalyzer?
-jake
On 3/24/08, Uwe Goetzke [EMAIL PROTECTED] wrote:
Hi Ivan,
No, we
Jake,
With the bigram-based index we gave up for the struggle to find a well working
language based index.
We had implemented soundex (or different sound-alikes) and hyphenating but
failed to deliver a user explainable search result (why is this ranked higher
and so on...). One reason may be
Hi:
Is there a way to random accessing term value in a field? e.g.
in my field, content, the terms are: lucene, is, cool
Is there a way to access content[2] - cool?
Thanks
-John
Hi Uwe,
I am curious what NGramStemFilter is? Is it a combination of porter
stemming and word ngram identification?
Thanks!
Jay
Uwe Goetzke wrote:
Hi Ivan,
No, we do not use StandardAnalyser or StandardTokenizer.
Most data is processed by
fTextTokenStream = result = new
Shailendra,
Have a look at the javadocs of HitCollector:
http://lucene.apache.org/java/2_3_0/api/core/org/apache/lucene/search/HitCollector.html
The problem is with the use of the disk head, when retrieving
the documents during collecting, the disk head has to move
between the inverted index and
: *We also read in one of the posts that we should use bitSet.set(doc)
: instead of calling searcher.doc(id). But we are unable to to understand how
: this might help in our case since we will anyway have to load the document
: to get the other required field(company_id). Also we observed that
Jay,
Have a look at Lucene config, it's all there, including tests. This filter
will take a token such as foobar and chop it up into n-grams (e.g. foobar -
fo oo ob ba ar would be a set of bi-grams). You can specify the n-gram size
and even min and max n-gram size.
Otis
--
Sematext --
Sorry, I could not find the filter in the 2.3 API class list (core +
contrib + test). I am not ware of lucene config file either. Could you
please tell me where it is in 2.3 release?
Thanks!
Jay
Otis Gospodnetic wrote:
Jay,
Have a look at Lucene config, it's all there, including tests.
Hi Jens,
I'm having a bit of a hard time following this, so perhaps you could
rephrase, or show your sample code or explain a bit more about what
you are trying to do at a higher level?
Cheers,
Grant
On Mar 25, 2008, at 10:46 AM, JensBurkhardt wrote:
Hi everybody,
I was searching for
On Mar 25, 2008, at 12:10 PM, JensBurkhardt wrote:
As you can see, both are exactly the same. The thing i don't
understand is,
that the two documents have different documentboosts (the first one
got an
boost of 1.62 , the second of 1.65) - the boosts are different
because the
two books
Hi Jay,
Sorry, lapsus calami, that would be Lucene *contrib*.
Have a look:
http://lucene.apache.org/java/2_3_1/api/contrib-analyzers/index.html
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Jay [EMAIL PROTECTED]
To:
Hi Otis,
I checked that contrib before and could not find NgramStemFilter. Am I
missing other contrib?
Thanks for the link!
Jay
Otis Gospodnetic wrote:
Hi Jay,
Sorry, lapsus calami, that would be Lucene *contrib*.
Have a look:
Sorry, I wrote this stuff, but forgot the naming.
Look:
http://lucene.apache.org/java/2_3_1/api/contrib-analyzers/org/apache/lucene/analysis/ngram/package-summary.html
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: yu [EMAIL PROTECTED]
To:
Sorry for my ignorance, I am looking for
NgramStemFilter specifically.
Are you suggesting that it's the same as NGramTokenFilter? Does it have
stemming in it?
Thanks again.
Jay
Otis Gospodnetic wrote:
Sorry, I wrote this stuff, but forgot the naming.
Look:
I am not sure how term vectors would help me. Term vectors are ordered by
frequency, not in lex order. Since I know in the dictionary the terms are
ordered by lex, seems it is possible for me to randomly get the nth term in
the dictionary without having to seek to it.
Thoughts?
Thanks
-John
On
23 matches
Mail list logo