RE: Searching over more than one Fields

2006-01-30 Thread Gwyn Carwardine
I was happy to take the hit of storing the text twice. I have created an aggregate field called CONTENTS that has all the other fields concatenated together. I also created a list of the other fields (because they can vary from doc to doc) in another field FIELDLIST I search this field and for

Re: deleting duplicate documents from my index

2006-01-30 Thread gekkokid
hi, thats exactly what i did :) works perfectly thanks _gk - Original Message - From: Chris Hostetter [EMAIL PROTECTED] To: java-user@lucene.apache.org Sent: Monday, January 30, 2006 5:56 AM Subject: Re: deleting duplicate documents from my index : Hi, im trying to delete

Related searches

2006-01-30 Thread Leon Chaddock
Hi, Does anyone know if it is possible to show related searches with lucene, for example if someone searched for car insurance you could bring back the results and related searches like these Automobile Insurance Car Insurance Quote Car Insurance Quotes Auto Insurance Cheap Car Insurance Car

Re: Throughput doesn't increase when using more concurrent threads

2006-01-30 Thread Peter Keegan
I cranked up the dial on my query tester and was able to get the rate up to 325 qps. Unfortunately, the machine died shortly thereafter (memory errors :-( ) Hopefully, it was just a coincidence. I haven't measured 64-bit indexing speed, yet. Peter On 1/29/06, Daniel Noll [EMAIL PROTECTED] wrote:

searching specific documents

2006-01-30 Thread Jebus
How do you search only certain documents. In the app I am writing before I start searching with Lucene I know all the documents that I want to search. For example I have documents A,B,C,X,Y,Z so before I start the search I know that I only want to search docs C,X,Y due to other non lucene

RE: searching specific documents

2006-01-30 Thread Mike Streeton
Use BitSets to intersect the two queries. First knock up a HitCollector that generates a bit set for the document set you want to search (A,B,C,X,Y,Z). Then do another query generating a bit set for the criteria on (C,X,Y). Then just interest the two bits sets using the and method. Mike

RE: Searching over more than one Fields

2006-01-30 Thread Mike Streeton
There are a number of ways of doing this. One way I would suggest if simply to store the CONTENTS fields and prefix it with the field name. So instead of storing a single CONTENTS field for a document, store a CONTENTS field for each other field with the field name prefixing each field value.

Why do we use BitSet class ?

2006-01-30 Thread Vikas Khengare
Hi Friends I am very New to Lucene World !!! As this world is interesting to me So I want to go in deep level of it to realize the beauty of it. So can you help me to realize that beauty ? I have question 1] Why do use BitSet Class ? 2] Is it required in Filtering / Sorting

Re: Why do we use BitSet class ?

2006-01-30 Thread Erik Hatcher
Please do not cross-post your questions. Your questions are best asked solely to java-user. Erik On Jan 30, 2006, at 9:23 AM, Vikas Khengare wrote: Hi Friends I am very New to Lucene World !!! As this world is interesting to me So I want to go in deep level of it

RE: How to find function() - ?

2006-01-30 Thread Dmitry Goldenberg
Michael, Yes, you're describing pretty much what I was thinking of but -- a) if I index function() as function() rather than function, does that mean that if I search for function, then it won't be found? -- the problem is that in some cases, the user will want to find function(), and in

Re: How to find function() - ?

2006-01-30 Thread Michael D. Curtin
Dmitry Goldenberg wrote: a) if I index function() as function() rather than function, does that mean that if I search for function, then it won't be found? -- the problem is that in some cases, the user will want to find function(), and in some cases just function -- can I accommodate for

Unable to optimize index: cannot delete deletable.new

2006-01-30 Thread Dalton, Jeffery
I have a periodic process that runs as a timer task that periodically optimizes my search index. However, I am having difficulties with this process failing: java.io.IOException: Cannot overwrite: C:\04950_04959\deleteable.new at

Re: grouping results by fields

2006-01-30 Thread zzzzz shalev
hey Jim, thanks alot for the quick reply! much appreciated i will look a little closer into what is done in C|Net , seems more cost efficient than what im currently doing ;) however i am not sure how scaleable the solution is if , for example, i recieved 20,000 results and i

RE: grouping results by fields

2006-01-30 Thread Mike Streeton
A simple solution if you only have 20,000 docs is just to iterate through the hits and count them up against each color etc, this could be in a HitCollector. The balance here is performance vs memory usage, if you have a lot of users I would go for a solution that was less efficient but used a lot

Re: Why do we use BitSet class ?

2006-01-30 Thread Chris Hostetter
: 1] Why do use BitSet Class ? : 2] Is it required in Filtering / Sorting of results or to Index ? BitSet is a usefull class for lots of things. the only time (that i know of) where it is part of the public lucene API is in the interaction between a Filter and the IndexSearcher ... so unless

RE: grouping results by fields

2006-01-30 Thread mark harwood
A simple solution if you only have 20,000 docs is just to iterate through the hits and count them up against each color etc, The one thing to avoid is reader.document() calls in such a tight loop. This is always a killer. The best way I've found is to create one bitset for all the matching

RE: grouping results by fields

2006-01-30 Thread Chris Hostetter
An approach like mark is describing sould should be a lot more space efficient then the BitSet intersection approach i described before, but depending on how many groupings you want, i can immagine that it might be slower some cases. Unfortunately, it also only works if the grouping you wnat are

Re: Help with indexing and query strategy

2006-01-30 Thread Rajesh Munavalli
For now, the best I could come up with is the following scheme SAMPLE DOCUMENTS: Lets say there are four documents: Doc1: st louis, missouri, usa Doc2: st louis du ha ha, quebec, canada Doc3: new york, NY, united states of america Doc4: ny, usa INDEX PHASE:

Number Searches vs Character

2006-01-30 Thread Aigner, Thomas
I am curious what would be the difference between searching for a number verses a character. I have a large index consisting of a few fields (So index would look something like: 123123123 my description my catalog Searching for 12* is much slower than searching for de* I don't have

RE: grouping results by fields

2006-01-30 Thread zzzzz shalev
thanks for the advice guys! currently , i am iterating through about 200-300 of the top docs and creating the groups (so, as of now, the groups are partial) , my response time HAS to be at most 500-600 milli (query + groupings) or my company will probably go with a commercial search

RE: grouping results by fields

2006-01-30 Thread Chris Hostetter
: currently , i am iterating through about 200-300 of the top docs and : creating the groups (so, as of now, the groups are partial) , my : response time HAS to be at most 500-600 milli (query + groupings) or my : company will probably go with a commercial search engine such as FAST or :

Re: Number Searches vs Character

2006-01-30 Thread Chris Hostetter
PrefixQuery is implimented as a BooleanQuery using term expansion. what that means is that a prefix query on a common prefix is much more expensive then a prefix query on a less common prefix. not just in terms of hte number of documents that match, but because of the number of terms that match

RE: grouping results by fields

2006-01-30 Thread zzzzz shalev
hey chris, i was using the hits.doc method while iterating,,, you've given me some hope!! i will look into the FieldCache Chris Hostetter [EMAIL PROTECTED] wrote: : currently , i am iterating through about 200-300 of the top docs and : creating the groups (so, as of now, the groups

Re: Help with indexing and query strategy

2006-01-30 Thread Jeff Rodenburg
Have you considered evaluating doc-score thresholds for limiting your results? Since the perfect answers to these situations lie in the constant tweaking and twiddling of analysis and tokenization, one way I've found to help is to evaluate result scores. In your Ontario CA example, limiting

RE: Help with indexing and query strategy

2006-01-30 Thread Colin Young
Actually, I arrived at a very similar solution for indexing as you did, but I've been away from a connection, so I haven't been able to post it here. Essentially I'm adding the items as you suggest, but I've built a synonym injector (actually I'm just using the one from Lucene in Action) to

RE: Help with indexing and query strategy

2006-01-30 Thread Colin Young
I have thought about that. I couldn't figure out a way to make it work. Fortunately, I have managed to solve the problem (excepting prefix or wildcard searches) which is very close to what Rajesh suggested (also see my response to his response). Thanks for taking a look. Colin -Original

Reindexing

2006-01-30 Thread revati joshi
Hello Lucene members. i tried to do reindexing using Lifecycle interface of Hibernate ,but i'm stuck up with the implementation part of this interface. I wrote the code for it but i'm now stuck up with the concept of Hibernate. It uses methods lkie