Re: Document boost, is it working?

2007-10-31 Thread Andrzej Bialecki
Bruno Dery wrote: Thanks for the help, you're right your example works. However looking in Luke I also see only ones (1 1 1) as the document boost. Then perhaps this value should be removed from the Luke's display ... because it will always read 1, and it's a correct value (see below). I

RE: Threading Indexing Processes : Can we write concurrently to Index?

2007-10-31 Thread Michael McCandless
Just to clarify here: yes, you really should have a single JVM with a single instance of IndexWriter, but use multiple threads calling IndexWriter.addDocument. Under the hood, IndexWriter can make use of alot of concurrency, so you should see a substantial gain in indexing throughput if you use

Get term id from dictionary

2007-10-31 Thread Ilias Flaounas
Dear experts, I need to store and index a string of text into Lucene, and later I want to get the Id of each term inside this string. Is it possible? How can I do that? I want a unique association, term (in my case a word) - Id. I know, that If I delete a document, the dictionary changes. Does

Re: Get term id from dictionary

2007-10-31 Thread Mark Miller
The id does change. You need to index your own id field with the document. Ilias Flaounas wrote: Dear experts, I need to store and index a string of text into Lucene, and later I want to get the Id of each term inside this string. Is it possible? How can I do that? I want a unique

optimizing only during certain time

2007-10-31 Thread jm
Hi, I understand optimizing could take longer when index is bigger, so it might take a while when index is huge. I think I remember seeing something in the lucene list about optimizing but not to the optimum case, only to a less than optimum state, but using less time, is that correct? Does

Looking for a keen Lucene developer/consultant ...

2007-10-31 Thread Kiffin Gish
Hi there! Currently I am looking for an expert developer/consultant who can assist my development team with an implementation of Lucene for an exciting and innovative project in Amsterdam, Holland. This is for a scalable, robust and high-performing web-based system running in an Java EE

Re: Get term id from dictionary

2007-10-31 Thread Ilias Flaounas
I want to have IDs for the terms (words) not the documents! Also, I need the same ID for a word if it appears in more than one documents. Example: Doc1: The sea is blue Doc2: Sky is blue For these two docs the dictionary would be [the]-1 [sea]-2 [is]-3 [blue]-4 [sky]-5 So I want to represent

Re: Looking for a keen Lucene developer/consultant ...

2007-10-31 Thread Mark Miller
If you haven't seen it, a good source for this is here: http://wiki.apache.org/lucene-java/Support Though thats not as nice as having people contact you :) Kiffin Gish wrote: Hi there! Currently I am looking for an expert developer/consultant who can assist my development team with an

Re: Get term id from dictionary

2007-10-31 Thread Mark Miller
You can check out the file format of Lucene's term dictionary here: http://lucene.apache.org/java/docs/fileformats.html#Term%20Dictionary That might give you some insight. Lucene does not keep id's for terms that I can tell though...just for documents...and then the id is really just an

2/3 of terms matched + coverage filter

2007-10-31 Thread Tobias Hill
My documents all hava a field with variables number of terms (but rather few): Doc1.field = foo bar gro Doc2.field = foo bar gro mot slu Now I would like to search using the terms foo bar gro Problem 1: I like to express that at least any two of the three terms must match. Do I have to construct

Re: 2/3 of terms matched + coverage filter

2007-10-31 Thread Donna L Gresh
I am not an expert but I think you can solve problem 1 by overriding the coord function in the similarity class: 1. coord(q,d) is a score factor based on how many of the query terms are found in the specified document. Typically, a document that contains more of the query's terms will

Best way to count tokens

2007-10-31 Thread Cool Coder
Hi Group, I need to display list of tokens (tags) in my side those have got maximum occurances in my index. One way I can think of is to keep track of all tokens during analysis and accordingly display them. Is there any other way? e.g. if I want to display tokens in order of

Problems while indexing

2007-10-31 Thread Jan F.
Hello to you all! I've programmed a portlet search solution by using lucene. Now that our new Website is short before release the file volume increesed fast. My lucene based index program works fine until the number of files incresed so much. Now it works for about 10 minutes and then gives

Re: Problems while indexing

2007-10-31 Thread Chris Lu
Hi, Jan, You really need to be more specific about your configuration and error log. Lucene surely has been used on many large websites. -- Chris Lu - Instant Scalable Full-Text Search On Any Database/Application site: http://www.dbsight.net demo:

Re: Problems while indexing

2007-10-31 Thread Mark Miller
You have no stack trace? Come man...you must be able to get a stack trace :) There is no way to tell what is causing that very generic error. I would say though: certainly not the size of your index. I suppose, out of curiosity, how big is it? I'll bet my two broken wrists thats not the

Re: Best way to count tokens

2007-10-31 Thread Karl Wettin
31 okt 2007 kl. 15.18 skrev Cool Coder: Hi Group, I need to display list of tokens (tags) in my side those have got maximum occurances in my index. One way I can think of is to keep track of all tokens during analysis and accordingly display them. Is there any other way?

RE: Document boost, is it working?

2007-10-31 Thread Bruno Dery
Thanks! I also noticed there is a mention of this in the documentation of Document.getBoost(): Note: This value is not stored directly with the document in the index. Documents returned from IndexReader.document(int) and Hits.doc(int) may thus not have the same value present as when this document

Re: 2/3 of terms matched + coverage filter

2007-10-31 Thread Paul Elschot
On Wednesday 31 October 2007 14:51:12 Tobias Hill wrote: My documents all hava a field with variables number of terms (but rather few): Doc1.field = foo bar gro Doc2.field = foo bar gro mot slu Now I would like to search using the terms foo bar gro Problem 1: I like to express that at

Hits.score mystery

2007-10-31 Thread Tom Conlon
Hi All, Query: systems AND 2000 Results:558 total matching documents I'm returning the document plus hits.score(i) * 100 but when the relevance is examined in the User interface it doesn't seem to be working. E.g. 'rough' feedback in terms of occurences 61.txt 18.356403 100%

Re: Hits.score mystery

2007-10-31 Thread Grant Ingersoll
Not sure what UI you are referring to, but you should have a look at Searcher.explain() for giving you information about why a particular document scored the way it does -Grant On Oct 31, 2007, at 2:14 PM, Tom Conlon wrote: Hi All, Query: systems AND 2000 Results:558 total

problem undestanding the hits.score

2007-10-31 Thread Jamal jamalator
Hi I have indexed this html document =z1 html body h1zo zo zo zo zo zo zo zo zo zo zo zo /h1br h1zo zo zo zo zo zo zo zo zo zo zo zo /h1br h1zo zo zo zo zo zo zo zo zo zo zo zo /h1 /body /html =z2= html body h1zo

Re: Hits.score mystery

2007-10-31 Thread Chris Hostetter
: I'm returning the document plus hits.score(i) * 100 but when the NOTE: the score returned by Hits is not a percentage ... it is an arbitrary number less then 1. it might be the raw score of the document or it might be the result of dividing the raw score by the raw score of the highest