return single document from duplicated documents in index

2006-06-09 Thread Alan Boo
g'day, i've two questions. let's say the following is my index with 2 field : title and contents title contents beer beer is good beer beer is good catsleepy dog what a cute one! beer

RE: Property comparison possible??

2006-06-09 Thread Chris Hostetter
: Is it possible to perform a search using fields instead of terms, eg. : like this sql: : SELECT col1, col2 : FROM table1 : WHERE col1 = col2 presumably col1 and col2 are untokenized fields? (otherwise equality is kind of vague) if you really wanted to add a constraint like this to an existing

Re: Adding Fields to Documents with UnStored Fields - crazy scheme?

2006-06-09 Thread Chris Hostetter
: 2. Recreating the index from scratch will require the moving of the : heavens and the earth. : : My crazy idea - can we add new Documents to the index with the Fields : we wish to add, and duplicate file IDs? i.e. an entry for file ID Foo : would consist of two Documents, : Document X:

Re: return single document from duplicated documents in index

2006-06-09 Thread Chris Hostetter
take a look at the HitCollector and Filter APIs .. you can impliment any logic you want in either of those classes to restrict what results you get -- and the FieldCache gives you an easy way to check what the value of a particular indexed field is. storing the mappings of field value to best

Re: Different scoring mechanism

2006-06-09 Thread Chris Hostetter
:! If a document does not contain a queryterm this score can be larger : or smaller than 0 ! if a document doesn't contain a term, then the scorer for that query will never even try to score that document -- regardless of what your Similarity class looks like. if you really want this kind

Re: Adding Fields to Documents with UnStored Fields - crazy scheme?

2006-06-09 Thread Bob Arens
On Jun 9, 2006, at 2:10 AM, Chris Hostetter wrote: : 2. Recreating the index from scratch will require the moving of the : heavens and the earth. : : My crazy idea - can we add new Documents to the index with the Fields : we wish to add, and duplicate file IDs? i.e. an entry for file ID

RE: Property comparison possible??

2006-06-09 Thread Robert Haycock
He he, nice comparison! Cheers for the advice. Rob. -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: 09 June 2006 08:00 To: java-user@lucene.apache.org Subject: RE: Property comparison possible?? : Is it possible to perform a search using fields instead of

RE: Compound / non-compound index files and SIGKILL

2006-06-09 Thread Rob Staveley (Tom)
I am no longer a Jira virgin. http://issues.apache.org/jira/browse/LUCENE-594 Thanks again. -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: 09 June 2006 07:13 To: java-user@lucene.apache.org Subject: RE: Compound / non-compound index files and SIGKILL : Whom

RE: Adding Fields to Documents with UnStored Fields - crazy scheme?

2006-06-09 Thread Robert Haycock
Hi Bob, No idea if this would work BUT... If the old index is optimized then you might be able to iterate through all the docs in your old index (sorted by doc id) and for each iteration add the corresponding doc to the new index so it has a matching doc id. The idea being that after searching

Re: Multisearch Problem

2006-06-09 Thread Dan Wiggin
My lucene version is 1.4.3 and always worked with this. Someday I have to do the change to Lucene 2.0. But the problem isn't this because the problem is something like One index have something indexed and other index is olnly created but without any document. It's very strange because this

RE: Different scoring mechanism

2006-06-09 Thread Trieschnigg, R.B. \(Dolf\)
:! If a document does not contain a queryterm this score can be larger : or smaller than 0 ! if a document doesn't contain a term, then the scorer for that query will never even try to score that document -- regardless of what your Similarity class looks like. if you really want

Re: adding term information to Index

2006-06-09 Thread Grant Ingersoll
Hi Patricio, As of now, I don't think this is possible. However, we are slowly but surely working on similar problems. Please feel free to add your two cents to http://wiki.apache.org/jakarta-lucene/FlexibleIndexing as we are considering several new ideas related to making indexing more

combining two query calls in one?

2006-06-09 Thread zzzzz shalev
hey, i am using the pmsearcher to retrieve data from a number of ram indexes. i am calling my own search function which calls the indexsearcher.search meathod and returns the top 100 ids/scores , however, before returning the topdocs i start a separate thread which requeries the index

searching multiple indexes in multiple servers.

2006-06-09 Thread Omar Didi
Hi all, my index size has grown too much and I keep getting outOfMemoryError after running few searches. I am using all the RAM that the JVM is allowing me 2.6GB. I am left with two solutions now, the easy and expensive solution is to upgrade the hardware to a 64-bit System and use more RAM.

COMMIT_LOCK_TIMEOUT - IndexSearcher/IndexReader

2006-06-09 Thread Michael Duval
Hi All, Has anyone else out there come across the shortcomings of the new COMMIT_LOCK_TIMEOUT in regards to searching on an actively updated Index? It used to be a settable system property and therefor semi dynamic across a system with multiple readers/searchers and one writer. I am aware

Numbertools and efficient sorting

2006-06-09 Thread Benjamin Stein
I have an integer field that I've indexed after converting to a string using NumberTools.longToString(). Now I want to sort my results using this field. Everything works when treating the field as a string, but is very slow and memory intensive. I want to use INT sorting instead, but these

Indexing question

2006-06-09 Thread manumohedano
Hi All! I have a trouble... When I index text documents in english, there is no problem, buy when I index Spanish text documents (And they're big), a lot of information form the document don't become Indexed (I suppose it is due to the Analyzer). Howewer I want to Index ALL the strings in the

Re: Aggregating category hits

2006-06-09 Thread Peter Keegan
I compared Solr's DocSetHitCollector and counting bitset intersections to get facet counts with a different approach that uses a custom hit collector that tests each docid hit (bit) with each facets' bitset and increments a count in a histogram. My assumption was that for queries with few hits,

Re: Adding Fields to Documents with UnStored Fields - crazy scheme?

2006-06-09 Thread Bob Arens
If the old index is optimized then you might be able to iterate through all the docs in your old index (sorted by doc id) and for each iteration add the corresponding doc to the new index so it has a matching doc id. The idea being that after searching on one index you could use the doc

Re: Adding Fields to Documents with UnStored Fields - crazy scheme?

2006-06-09 Thread Chris Hostetter
: : would consist of two Documents, : : Document X: fileID:Foo, contents:unknown : : Document Y:fileID:Foo, title:Bar, url:www.baz.com, etc. : add another document with the same fileID and a title field and a url : field, and you search for contents:germany you're still going to get : back

RE: Different scoring mechanism

2006-06-09 Thread Chris Hostetter
: For example: a query containing two terms: fast, car, having : document frequencies 300.000 and 20.000 in the index respectively. In a : worst case scenario this would require 320.000 document scores to be : calculated. I am not really sure how lucene optimizes its search, but I : guess it

Re: Adding Fields to Documents with UnStored Fields - crazy scheme?

2006-06-09 Thread Bob Arens
: That kinda would be the point - contents:germany would get the same : fileIDs, but contents:germany title:medicine would (hopefully) give : us a more specific query. when you say contents:germany title:medicine i'm not sure if you are assuming that both clauses are mandatory or optional

Problems indexing large documents

2006-06-09 Thread manu mohedano
Hi All! I have a trouble... When I index text documents in english, there is no problem, buy when I index Spanish text documents (And they're big), a lot of information from the document don't become indexed (I suppose it is due to the Analyzer, but if the documents is less tahn 400kb it works

Re: Problems indexing large documents

2006-06-09 Thread Daniel Naber
On Freitag 09 Juni 2006 21:31, manu mohedano wrote: Hi All! I have a trouble... When I index text documents in english, there is no problem, buy when I index Spanish text documents (And they're big), a lot of information from the document don't become indexed Read the FAQ at

RE: Problems indexing large documents

2006-06-09 Thread Pasha Bizhan
Hi, From: manu mohedano [mailto:[EMAIL PROTECTED] Hi All! I have a trouble... When I index text documents in english, there is no problem, buy when I index Spanish text documents (And they're big), a lot of information from the document don't become indexed (I suppose it is due to the

Re: Adding Fields to Documents with UnStored Fields - crazy scheme?

2006-06-09 Thread Chris Hostetter
: fileID twice .. if you mean you want the list of fileIDs that match : both : clauses, you're not going to get any results back -- because no doc : with a : contents field is going to have a title field, and no doc with a title : field is going to have a contents field. : I'd want both

Re: Numbertools and efficient sorting

2006-06-09 Thread Chris Hostetter
: I have an integer field that I've indexed after converting to a string : using NumberTools.longToString(). : Now I want to sort my results using this field. Everything works when : treating the field as a string, but is very slow and memory intensive. : : I want to use INT sorting instead, but

Re: Adding Fields to Documents with UnStored Fields - crazy scheme?

2006-06-09 Thread Yonik Seeley
On 6/8/06, Bob Arens [EMAIL PROTECTED] wrote: I've been handed a legacy index containing Documents with two Fields; one is a file ID, the other is contents of the file. The contents field was added using UnStored. Now, we want to add fields. Oh, the humanity! My crazy idea - can we add new

Re: Indexing question

2006-06-09 Thread Erick Erickson
Couple of things. 1 you can use a different analyzer to NOT remove stopwords. SimpleAnalyzer comes to mind (though watch out for case). Look at LuceneInAction for an explanation of several analyzers that are available. 2 If memory servers, Lucene defaults to indexing only the first 10,000 words

Problems indexing large documents

2006-06-09 Thread manu mohedano
Problem Solved! Thank's a lot guys!!!