Bruno Dery wrote:
Thanks for the help, you're right your example works. However looking in
Luke I also see only ones (1 1 1) as the document boost.
Then perhaps this value should be removed from the Luke's display ...
because it will always read 1, and it's a correct value (see below).
I
Just to clarify here: yes, you really should have a single JVM with a
single instance of IndexWriter, but use multiple threads calling
IndexWriter.addDocument.
Under the hood, IndexWriter can make use of alot of concurrency, so
you should see a substantial gain in indexing throughput if you use
Dear experts,
I need to store and index a string of text into Lucene, and later I
want to get the Id of each term inside this string. Is it possible?
How can I do that?
I want a unique association, term (in my case a word) - Id. I know,
that If I delete a document, the dictionary changes. Does
The id does change. You need to index your own id field with the document.
Ilias Flaounas wrote:
Dear experts,
I need to store and index a string of text into Lucene, and later I
want to get the Id of each term inside this string. Is it possible?
How can I do that?
I want a unique
Hi,
I understand optimizing could take longer when index is bigger, so it
might take a while when index is huge.
I think I remember seeing something in the lucene list about optimizing but not
to the optimum case, only to a less than optimum state, but using less
time, is that correct?
Does
Hi there!
Currently I am looking for an expert developer/consultant who can assist
my development team with an implementation of Lucene for an exciting and
innovative project in Amsterdam, Holland.
This is for a scalable, robust and high-performing web-based system
running in an Java EE
I want to have IDs for the terms (words) not the documents!
Also, I need the same ID for a word if it appears in more than one documents.
Example:
Doc1: The sea is blue
Doc2: Sky is blue
For these two docs the dictionary would be [the]-1 [sea]-2 [is]-3
[blue]-4 [sky]-5
So I want to represent
If you haven't seen it, a good source for this is here:
http://wiki.apache.org/lucene-java/Support
Though thats not as nice as having people contact you :)
Kiffin Gish wrote:
Hi there!
Currently I am looking for an expert developer/consultant who can assist
my development team with an
You can check out the file format of Lucene's term dictionary here:
http://lucene.apache.org/java/docs/fileformats.html#Term%20Dictionary
That might give you some insight.
Lucene does not keep id's for terms that I can tell though...just for
documents...and then the id is really just an
My documents all hava a field with variables number of terms
(but rather few):
Doc1.field = foo bar gro
Doc2.field = foo bar gro mot slu
Now I would like to search using the terms foo bar gro
Problem 1:
I like to express that at least any two of the three terms
must match. Do I have to construct
I am not an expert but I think you can solve problem 1 by overriding the
coord function in the similarity class:
1. coord(q,d) is a score factor based on how many of the query terms
are found in the specified document. Typically, a document that contains
more of the query's terms will
Hi Group,
I need to display list of tokens (tags) in my side those have got
maximum occurances in my index. One way I can think of is to keep track of all
tokens during analysis and accordingly display them. Is there any other way?
e.g. if I want to display tokens in order of
Hello to you all!
I've programmed a portlet search solution by using lucene.
Now that our new Website is short before release the file volume
increesed fast.
My lucene based index program works fine until the number of files
incresed so much. Now it works for about 10 minutes and then gives
Hi, Jan,
You really need to be more specific about your configuration and error log.
Lucene surely has been used on many large websites.
--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo:
You have no stack trace? Come man...you must be able to get a stack
trace :) There is no way to tell what is causing that very generic
error. I would say though: certainly not the size of your index.
I suppose, out of curiosity, how big is it? I'll bet my two broken
wrists thats not the
31 okt 2007 kl. 15.18 skrev Cool Coder:
Hi Group,
I need to display list of tokens (tags) in my side
those have got maximum occurances in my index. One way I can think
of is to keep track of all tokens during analysis and accordingly
display them. Is there any other way?
Thanks! I also noticed there is a mention of this in the documentation
of Document.getBoost():
Note: This value is not stored directly with the document in the index.
Documents returned from IndexReader.document(int) and Hits.doc(int) may
thus not have the same value present as when this document
On Wednesday 31 October 2007 14:51:12 Tobias Hill wrote:
My documents all hava a field with variables number of terms
(but rather few):
Doc1.field = foo bar gro
Doc2.field = foo bar gro mot slu
Now I would like to search using the terms foo bar gro
Problem 1:
I like to express that at
Hi All,
Query: systems AND 2000
Results:558 total matching documents
I'm returning the document plus hits.score(i) * 100 but when the
relevance is examined in the User interface it doesn't seem to be
working.
E.g. 'rough' feedback in terms of occurences
61.txt 18.356403 100%
Not sure what UI you are referring to, but you should have a look at
Searcher.explain() for giving you information about why a particular
document scored the way it does
-Grant
On Oct 31, 2007, at 2:14 PM, Tom Conlon wrote:
Hi All,
Query: systems AND 2000
Results:558 total
Hi
I have indexed this html document
=z1
html
body
h1zo zo zo zo zo zo zo zo zo zo zo zo /h1br
h1zo zo zo zo zo zo zo zo zo zo zo zo /h1br
h1zo zo zo zo zo zo zo zo zo zo zo zo /h1
/body
/html
=z2=
html
body
h1zo
: I'm returning the document plus hits.score(i) * 100 but when the
NOTE: the score returned by Hits is not a percentage ... it is an
arbitrary number less then 1. it might be the raw score of the document
or it might be the result of dividing the raw score by the raw score
of the highest
22 matches
Mail list logo