I was happy to take the hit of storing the text twice.
I have created an aggregate field called CONTENTS that has all the other
fields concatenated together.
I also created a list of the other fields (because they can vary from doc to
doc) in another field FIELDLIST
I search this field and for
hi, thats exactly what i did :) works perfectly
thanks
_gk
- Original Message -
From: Chris Hostetter [EMAIL PROTECTED]
To: java-user@lucene.apache.org
Sent: Monday, January 30, 2006 5:56 AM
Subject: Re: deleting duplicate documents from my index
: Hi, im trying to delete
Hi,
Does anyone know if it is possible to show related searches with lucene, for
example if someone searched for car insurance you could bring back the
results and related searches like these
Automobile Insurance
Car Insurance Quote
Car Insurance Quotes
Auto Insurance
Cheap Car Insurance
Car
I cranked up the dial on my query tester and was able to get the rate up to
325 qps. Unfortunately, the machine died shortly thereafter (memory errors
:-( ) Hopefully, it was just a coincidence. I haven't measured 64-bit
indexing speed, yet.
Peter
On 1/29/06, Daniel Noll [EMAIL PROTECTED] wrote:
How do you search only certain documents. In the app I am writing
before I start searching with Lucene I know all the documents that I
want to search. For example I have documents A,B,C,X,Y,Z so before I
start the search I know that I only want to search docs C,X,Y due to
other non lucene
Use BitSets to intersect the two queries. First knock up a HitCollector
that generates a bit set for the document set you want to search
(A,B,C,X,Y,Z). Then do another query generating a bit set for the
criteria on (C,X,Y). Then just interest the two bits sets using the
and method.
Mike
There are a number of ways of doing this. One way I would suggest if simply to
store the CONTENTS fields and prefix it with the field name. So instead of
storing a single CONTENTS field for a document, store a CONTENTS field for each
other field with the field name prefixing each field value.
Hi Friends
I am very New to Lucene World !!! As this world is interesting to me
So I want to go in deep level of it to realize the beauty of it.
So can you help me to realize that beauty ?
I have question
1] Why do use BitSet Class ?
2] Is it required in Filtering / Sorting
Please do not cross-post your questions. Your questions are best
asked solely to java-user.
Erik
On Jan 30, 2006, at 9:23 AM, Vikas Khengare wrote:
Hi Friends
I am very New to Lucene World !!! As this world is interesting
to me
So I want to go in deep level of it
Michael,
Yes, you're describing pretty much what I was thinking of but --
a) if I index function() as function() rather than function, does that
mean that if I search for function, then it won't be found? -- the problem is
that in some cases, the user will want to find function(), and in
Dmitry Goldenberg wrote:
a) if I index function() as function() rather than function, does that mean that if
I search for function, then it won't be found? -- the problem is that in some cases, the user will want to
find function(), and in some cases just function -- can I accommodate for
I have a periodic process that runs as a timer task that periodically
optimizes my search index. However, I am having difficulties with this
process failing:
java.io.IOException: Cannot overwrite: C:\04950_04959\deleteable.new
at
hey Jim,
thanks alot for the quick reply! much appreciated
i will look a little closer into what is done in C|Net , seems more cost
efficient than what im currently doing ;)
however i am not sure how scaleable the solution is
if , for example, i recieved 20,000 results and i
A simple solution if you only have 20,000 docs is just to iterate
through the hits and count them up against each color etc, this could be
in a HitCollector. The balance here is performance vs memory usage, if
you have a lot of users I would go for a solution that was less
efficient but used a lot
: 1] Why do use BitSet Class ?
: 2] Is it required in Filtering / Sorting of results or to Index ?
BitSet is a usefull class for lots of things. the only time (that i know
of) where it is part of the public lucene API is in the interaction
between a Filter and the IndexSearcher ... so unless
A simple solution if you only have 20,000 docs is
just to iterate
through the hits and count them up against each
color etc,
The one thing to avoid is reader.document() calls in
such a tight loop. This is always a killer.
The best way I've found is to create one bitset for
all the matching
An approach like mark is describing sould should be a lot more space
efficient then the BitSet intersection approach i described before, but
depending on how many groupings you want, i can immagine that it might be
slower some cases.
Unfortunately, it also only works if the grouping you wnat are
For now, the best I could come up with is the following scheme
SAMPLE DOCUMENTS:
Lets say there are four documents:
Doc1: st louis, missouri, usa
Doc2: st louis du ha ha, quebec, canada
Doc3: new york, NY, united states of america
Doc4: ny, usa
INDEX PHASE:
I am curious what would be the difference between searching for a number
verses a character.
I have a large index consisting of a few fields (So index would look
something like: 123123123 my description my catalog
Searching for 12* is much slower than searching for de*
I don't have
thanks for the advice guys!
currently , i am iterating through about 200-300 of the top docs and creating
the groups (so, as of now, the groups are partial) , my response time HAS to be
at most 500-600 milli (query + groupings) or my company will probably go with a
commercial search
: currently , i am iterating through about 200-300 of the top docs and
: creating the groups (so, as of now, the groups are partial) , my
: response time HAS to be at most 500-600 milli (query + groupings) or my
: company will probably go with a commercial search engine such as FAST or
:
PrefixQuery is implimented as a BooleanQuery using term expansion. what
that means is that a prefix query on a common prefix is much more
expensive then a prefix query on a less common prefix. not just in terms
of hte number of documents that match, but because of the number of terms
that match
hey chris,
i was using the hits.doc method while iterating,,,
you've given me some hope!! i will look into the FieldCache
Chris Hostetter [EMAIL PROTECTED] wrote:
: currently , i am iterating through about 200-300 of the top docs and
: creating the groups (so, as of now, the groups
Have you considered evaluating doc-score thresholds for limiting your
results? Since the perfect answers to these situations lie in the constant
tweaking and twiddling of analysis and tokenization, one way I've found to
help is to evaluate result scores. In your Ontario CA example, limiting
Actually, I arrived at a very similar solution for indexing as you did,
but I've been away from a connection, so I haven't been able to post it
here.
Essentially I'm adding the items as you suggest, but I've built a
synonym injector (actually I'm just using the one from Lucene in
Action) to
I have thought about that. I couldn't figure out a way to make it work.
Fortunately, I have managed to solve the problem (excepting prefix or
wildcard searches) which is very close to what Rajesh suggested (also
see my response to his response).
Thanks for taking a look.
Colin
-Original
Hello Lucene members.
i tried to do reindexing using Lifecycle
interface of Hibernate
,but i'm stuck up with the implementation part of this interface.
I wrote the code for it but i'm now stuck up with the concept of Hibernate.
It uses methods lkie
27 matches
Mail list logo