Isn't the break on line 162 of RangeQuery.java supposed to achieve this?

Alex Winston wrote:
otis,

i was able to fix the junit build problems, with the newest versions of
ant in regards to lucene unit tests. it appears that the junit.jar must
appear in the $ANT_HOME/lib dir in order to run such optional taskdefs
as JUnitTask.

the following link was very helpful.
http://barracuda.enhydra.org/project/mailingLists/barracuda/msg04810.html

additionally i was able to unit test lucene with the one line change
that i suggested with success, although i have not looked into how
thorough the unit tests are for cases like this.

the diff follows from a cvs snapshot from yesterday (note the added
break;):
*** RangeQuery.java Sat Nov 9 09:54:05 2002
--- RangeQuery.java.old Sat Nov 9 09:53:37 2002
***************
*** 164,170 ****
TermQuery tq = new
TermQuery(term); // found a match
tq.setBoost(boost); // set
the boost
q.add(tq, false, false); // add
to q
- break; //ADDED!
}
} else
--- 164,169 ----


i also pondered the ramifications of such a change, and have a few
thoughts. it appears that this is successful because it eliminates the
massive overhead of the byte[] built by the TermScorer when there are
thousands of terms, but a side-effect may be that it will not accurately
return a valid score. i have yet to test this, and my understanding of
the code is still very limited. although i do not have a firm grasp of
what is involved in scoring, is there not a possibility to score based
on the number of results matched for this particular field as opposed to
the current implementation.

any thoughts?

as i look through the code some more i will offer my thoughts on a
possible reimplementation of RangeQuery to alleviate the overhead when
there are thousands of terms as opposed to this simple one line change
which may have hidden side-effects.

i can also send a copy of some simple tests to show how to create this
situation with profiling results if that would be helpful.


thanks
alex



On Fri, 2002-11-08 at 17:40, Alex Winston wrote:

actually i was mistaken, i thought the tests ran successfully but after
looking again i merely got a BUILD SUCCESSFUL, apparently lucenes build
cannot find JUnitTask out of the box with ant1.5.1.  i have not had any
time to work through the problem.  i will look into it tomorrow, if you
have any thoughts in the meantime let me know.

thanks
alex



On Fri, 2002-11-08 at 16:46, Otis Gospodnetic wrote:

Hello,

Did you say that you run 'ant test-unit' and that all tests still pass?
If so, could you attach a cvs diff -ucN RangeQuery.java?

Thanks,
Otis


--- Alex Winston <[EMAIL PROTECTED]> wrote:

apologizes for replying to myself, but another nice side-effect of
this
fix is that it virtually eliminates the potential for an
OutOfMemoryError, which was a problem i encountered on extremely
large
fields, over 10000 terms, while i was profiling the RangeQuery class.

i can get into specifics if need be, any thoughts?

alex


On Fri, 2002-11-08 at 15:54, Alex Winston wrote:

thanks for the reply, my apologizes for not explaining myself very
clearly, it has been a long day.

you expressed exactly our situation, unfortunately this is not an
option

because we want to have multiple ranges for each document as well, there is a possible extension of what you suggested but that is a
last

resort.  kinda crazy i know, but you have to meet requirements :).

but i also had a thought while i was looking through the lucene
code,

and any comments are welcome.
i may be very mistaken because it has been a long day but if you
look at

the current cvs version of RangeQuery it appears that even if a
match is

found it will continue to iterate over terms within a field, and in
my

case it is on the order of thousands.  if i add a break after a
match

has been found it appears as though the search is improved on avg
an

order of magnitude, my math has left me so i cannot be theoretical
at

the moment.  i have unit tested the change on my side and on the
lucene

side and it works.  note: one hard example is that a query went
from 20

seconds to .5 seconds.  any initial thoughts to if there is a case
where

this would not work?

beginning line 164:
TermQuery tq = new TermQuery(term);	  // found a match
tq.setBoost(boost);			   // set the boost
q.add(tq, false, false);		  // add to q
break;  // ADDED!


On Fri, 2002-11-08 at 15:09, Mike Barry wrote:

Alex,

It is rather confusing. It sounds like you've indexed
a field that that can be between two values (let's say
E-J) and then when you have a search term such as G
you want the docs containing E-J (or A-H or F-K but not A-H
nor A-C nor J-Z)

Just of the top of my head but could you index the upper and
lower bounds as separate fields then when you search do a
compound query:

    lower_bound:{ - search_term } AND upper_bound:{ search_term

- }

just a thought.

-MikeB.

Alex Winston wrote:


i was hoping that someone could briefly review my current

solution to a

problem that we have encountered to see if anyone could suggest

a

possible alternative, because as it stands we have pushed

lucene past

its current limits.

PROBLEM:

we were wanting to represent a range of values for a particular

field

that is searchable over a particular range.

an example follows for clarification:
we were wanting to store a range of chapters and verses of a

book for a

particular document, and in turn search to see if a query range

includes

the range that is represented in the index.

if this is unclear please ask for clarification

IMPRACTICAL SOLUTION:

although this solution seems somewhat impractical it is all we

could

come up with.

our solution involved storing each possible range value within

the term

which would allow for RangeQuerys to be performed on this

particular

field.  for very small ranges this seems somewhat practical

after

profiling.  although once the field ranges began to span

multiple

chapters and verses, the search times became unreasonable

because we

were storing thousands of entries for each representative

range.

i can elaborate on anything that is unclear,
but any thoughts on a possible alternative solution within

lucene that

we overlooked would be extremely helpful.
	

alex


--
To unsubscribe, e-mail:

<mailto:lucene-user-unsubscribe@;jakarta.apache.org>

For additional commands, e-mail:

<mailto:lucene-user-help@;jakarta.apache.org>


ATTACHMENT part 2 application/pgp-signature name=signature.asc


__________________________________________________
Do you Yahoo!?
U2 on LAUNCH - Exclusive greatest hits videos
http://launch.yahoo.com/u2

--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@;jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@;jakarta.apache.org>





--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@;jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@;jakarta.apache.org>

Reply via email to