Hi
is it possible (or a trickery way) to search with a given query in which we
can set an equality for two fields
for example:
Document:
field1 field2field3 field4
Query:
field1:test phrase AND field2:test AND field3:field4
in this query we said that do
Have you looked at the explains to see what is coming out of the
FuzzyQuery? Also, are you using Hits to get that score? Scores get
normalized to 1 by that process.
-Grant
On Apr 11, 2007, at 2:06 AM, Michael Barbarelli wrote:
Hello.
I am using Lucene to submit fuzzy queries against an
11 apr 2007 kl. 04.21 skrev Grant Ingersoll:
Would some sort of caching strategy work? How big is your overall
collection?
Also, lately there have been a few threads on TV (term vector)
performance. I don't recall anyone having actively profiled or
examined it for improvements, so
Well, there's nothing here to help you with, since you haven't provided
any information to diagnose. Like:
What queries are actually produced in the different cases?
Use query.toString().
I'm immediately suspicious of any statement that my custom
code shouldn't be the problem. Try the test
On Apr 11, 2007, at 9:07 AM, karl wettin wrote:
11 apr 2007 kl. 04.21 skrev Grant Ingersoll:
Would some sort of caching strategy work? How big is your overall
collection?
Also, lately there have been a few threads on TV (term vector)
performance. I don't recall anyone having actively
Hi Grant.
Yes, I'm getting the score from the Hits collection. And yes, they get
normalized to 1; which is what I don't want.
Or, I can leave the Hits objects as is, but I know Lucene also must
calculate a raw difference as part of the overall score calculation.
How can I get at that value?
Go for a HitCollector. In particular, TopDocs will give you the raw
scores.
Erick
On 4/11/07, Michael Barbarelli [EMAIL PROTECTED] wrote:
Hi Grant.
Yes, I'm getting the score from the Hits collection. And yes, they get
normalized to 1; which is what I don't want.
Or, I can leave the Hits
Not really. The explain scores aren't normalized and I also couldn't
find a way to get the explain data as anything other than a whitespace
formatted text blob from Solr. Keep in mind that they need confidence
factors from one query to the next. With the explain scores, they can
have wildly
Oh geeze. Gmail ripped my pretty table to shreds. Let me try again:
A
-- id title
title score director director score year year
score overall score
B
Mike Klaas elaborates on syntax:
+(-A +B) - must match (-A +B) - must contain B and must not contain A
-(-A +B) - must not match (-A +B) - must not (match B and not contain A)
Ok, the take-away from this I'm getting is that these clauses read very much
like English and behave just the same.
Thank you Erick! Will give it a shot!
On 4/11/07, Erick Erickson [EMAIL PROTECTED] wrote:
Go for a HitCollector. In particular, TopDocs will give you the raw
scores.
Erick
On 4/11/07, Michael Barbarelli [EMAIL PROTECTED] wrote:
Hi Grant.
Yes, I'm getting the score from the Hits
Are there any plans to put together a Lucene BOF at Amsterdam?
--
Sami Siren
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
+1
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Thanks for your reply. I should have given more information and will keep in
mind this for my future queries.
Regarding this one I have already done most of things you have asked like:
1. I am confirming what query is getting executed by using query.toString()
2. I read lot of posts in the forum
Thanks for your reply. I should have given more information and will keep in
mind this for my future queries.
Regarding this one I have already done most of things you have asked like:
1. I am confirming what query is getting executed by using query.toString()
2. I read lot of posts in the forum
On 4/11/07, Koji Sekiguchi [EMAIL PROTECTED] wrote:
In the program, I added these three documents to the index,
then deleted all of them, and then added them to the index on purpose.
If I optimize the index, idf gets into 1.0 with Lucene 2.1 (uncomment in
the program).
Is it a feature?
Hello,
I have the following three documents in my index:
- Java programming is required to write Lucene application.
- Java is a popular computer language. I like Java.
- Perl is not a kind of jewelry. It is a programming language.
With Lucene 2.0, if I search java and print explanation, the
: here is that it's not that I'm finding different documents, but rather it's
: the same set and they will be ranked differently.
:
: Can you point me at a resource that explains the ranking and coord factors?
: I'm trying to understand scoring better. Going to the BooleanQuery
The best
On Wednesday 11 April 2007 18:51, Lokeya wrote:
Thanks for your reply. I should have given more information and will
keep in mind this for my future queries.
If nothing else helps, please write a small, standalone test-case that
shows the problem. This can then easily be debugged by someone
Hi,
I've just started using Lucene. Can anybody assist me in calculating
the term frequencies of the terms(words) that occur in a document(*.txt),
when a particular doc is submitted.
Say when i submit sample.txt , i should first analyze the document
with a standard anlyzer, then the term
Hi.
I have encountered a problem searching in my application because of
inconsistant unicode normalization forms in the corpus (and the queries). I
would like to normalize to form NFKD in an analyzer (I think). I was thinking
about creating a filter similar to the lowercasefilter that would do
Add Term Vectors to your Field during indexing. See the Field
constructors. To get a Term Vector out, see
IndexReader.getTermFreqVector method.
-Grant
On Apr 11, 2007, at 3:23 PM, sai hariharan wrote:
Hi,
I've just started using Lucene. Can anybody assist me in calculating
the term
Hello Lucene users,
I'm rather new to lucene and java but have done work with other
search engines some time before.
Right now I'm trying my hands (and luck) on a 'search as you type'-
sort of high performance search a la GoogleSuggest.
There meanwhile are on the net, a number of examples for
Steffen Heinrich wrote:
Normally an IndexWriter uses only one default Analyzer for all its
tokenizing businesses. And while it is appearantly possible to supply
a certain other instance when adding a specific document there seems
to be no way to use different analyzers on different fields
Rather than using a search, have you thought about using a TermEnum?
It's much, much, much faster than a query. What it allows you to do
is enumerate the terms in the index on a per-field basis. Essentially, this
is what happens when you do a PrefixQuery as BooleanClauses are
added, but you have
: Not really. The explain scores aren't normalized and I also couldn't
: find a way to get the explain data as anything other than a whitespace
: formatted text blob from Solr. Keep in mind that they need confidence
the defualt way Solr dumps score explainations is just as plain text, but
the
: I have encountered a problem searching in my application because of
: inconsistant unicode normalization forms in the corpus (and the
: queries). I would like to normalize to form NFKD in an analyzer (I
: think). I was thinking about creating a filter similar to the
i'm very naive to the
Walt Stoneburner wrote:
Does +(A1 A2 A3) +(B1 B2 B3) -(C1 C2 C3) find documents that have at least
one A -and- at least one B, but never any Cs? ...to which I'm now given to
understand the answer is yes. And understand why.
Well, that example would follow standard boolean logic if that's the
I have gone through the mailing list in search of posts for this error.
Though there are many, I feel my problem is little different from that and
like to get some advice on this.
Details:
1. Using a machine with RAM 2GB
2. Created an Index of size 200 MB.
3. Trying to do a search on this for
That certainly seems odd. How much memory are you allocating
your JVM?
Erick
On 4/11/07, Lokeya [EMAIL PROTECTED] wrote:
I have gone through the mailing list in search of posts for this error.
Though there are many, I feel my problem is little different from that and
like to get some advice
Yonik,
Thank you for your explanation.
In passing, I realized this issue by my customer. They are using Solr.
To reproduce the issue with Solr, post exampledocs/*.xml twice
and issue a query with q=ipoddebugQuery=on.
This should be the same for Lucene 2.0 and 2.1.
I understand. But I think we
It is using the default size allocated by the OS, which I don't have any
idea how much exactly. But when I use the -Xmx1024m and run this is not
occuring. Also I make some change in that loop now keeping only the Document
hitDoc = hits.doc(i); line and thats where it starts throwing error. But I
On 4/11/07, Chris Hostetter [EMAIL PROTECTED] wrote:
: I have encountered a problem searching in my application because of
: inconsistant unicode normalization forms in the corpus (and the
: queries). I would like to normalize to form NFKD in an analyzer (I
: think). I was thinking about
On 4/11/07, Mike Klaas [EMAIL PROTECTED] wrote:
Unicode characters do not map
precisely to code points: a single character can often be represented
via a single codepoint or a combination of two (surrogate pair).
I normally hear surrogates in the context of UTF-16 after the code point space
Yonik Seeley wrote:
have no idea how java's String class handles this--I doubt it does any
intelligent normalization.
UTF-16 surrogates are handled as of Java5.
And as of Java6 we have the java.text.Normalizer utility.
Daniel
--
Daniel Noll
Nuix Pty Ltd
Suite 79, 89 Jones St, Ultimo NSW
On 4/11/07, Yonik Seeley [EMAIL PROTECTED] wrote:
On 4/11/07, Mike Klaas [EMAIL PROTECTED] wrote:
Unicode characters do not map
precisely to code points: a single character can often be represented
via a single codepoint or a combination of two (surrogate pair).
I normally hear surrogates
36 matches
Mail list logo