apologizes for replying to myself, but another nice side-effect of
this
fix is that it virtually eliminates the potential for an
OutOfMemoryError, which was a problem i encountered on extremely
large
fields, over 10000 terms, while i was profiling the RangeQuery class.
i can get into specifics if need be, any thoughts?
alex
On Fri, 2002-11-08 at 15:54, Alex Winston wrote:
thanks for the reply, my apologizes for not explaining myself very
clearly, it has been a long day.
you expressed exactly our situation, unfortunately this is not an
option
because we want to have multiple ranges for each document as well,
there is a possible extension of what you suggested but that is a
last
resort. kinda crazy i know, but you have to meet requirements :).
but i also had a thought while i was looking through the lucene
code,
and any comments are welcome.
i may be very mistaken because it has been a long day but if you
look at
the current cvs version of RangeQuery it appears that even if a
match is
found it will continue to iterate over terms within a field, and in
my
case it is on the order of thousands. if i add a break after a
match
has been found it appears as though the search is improved on avg
an
order of magnitude, my math has left me so i cannot be theoretical
at
the moment. i have unit tested the change on my side and on the
lucene
side and it works. note: one hard example is that a query went
from 20
seconds to .5 seconds. any initial thoughts to if there is a case
where
this would not work?
beginning line 164:
TermQuery tq = new TermQuery(term); // found a match
tq.setBoost(boost); // set the boost
q.add(tq, false, false); // add to q
break; // ADDED!
On Fri, 2002-11-08 at 15:09, Mike Barry wrote:
Alex,
It is rather confusing. It sounds like you've indexed
a field that that can be between two values (let's say
E-J) and then when you have a search term such as G
you want the docs containing E-J (or A-H or F-K but not A-H
nor A-C nor J-Z)
Just of the top of my head but could you index the upper and
lower bounds as separate fields then when you search do a
compound query:
lower_bound:{ - search_term } AND upper_bound:{ search_term
- }
just a thought.
-MikeB.
Alex Winston wrote:
i was hoping that someone could briefly review my current
solution to a
problem that we have encountered to see if anyone could suggest
a
possible alternative, because as it stands we have pushed
lucene past
its current limits.
PROBLEM:
we were wanting to represent a range of values for a particular
field
that is searchable over a particular range.
an example follows for clarification:
we were wanting to store a range of chapters and verses of a
book for a
particular document, and in turn search to see if a query range
includes
the range that is represented in the index.
if this is unclear please ask for clarification
IMPRACTICAL SOLUTION:
although this solution seems somewhat impractical it is all we
could
come up with.
our solution involved storing each possible range value within
the term
which would allow for RangeQuerys to be performed on this
particular
field. for very small ranges this seems somewhat practical
after
profiling. although once the field ranges began to span
multiple
chapters and verses, the search times became unreasonable
because we
were storing thousands of entries for each representative
range.
i can elaborate on anything that is unclear,
but any thoughts on a possible alternative solution within
lucene that
we overlooked would be extremely helpful.
alex
--
To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@;jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-user-help@;jakarta.apache.org>
ATTACHMENT part 2 application/pgp-signature name=signature.asc