unless i am mistaken this break will only occur if the current term within the field is greater than (or equal to when exclusive) the upperTerm. so if a matching term has been found within the range it will still continue to iterate until a term meets this criteria, or the while loop ends, unless this is the intended behavior and I am overlooking something.
any thoughts? thanks alex On Tue, 2002-11-12 at 13:25, Doug Cutting wrote: > Isn't the break on line 162 of RangeQuery.java supposed to achieve this? > > Alex Winston wrote: > > otis, > > > > i was able to fix the junit build problems, with the newest versions of > > ant in regards to lucene unit tests. it appears that the junit.jar must > > appear in the $ANT_HOME/lib dir in order to run such optional taskdefs > > as JUnitTask. > > > > the following link was very helpful. > > http://barracuda.enhydra.org/project/mailingLists/barracuda/msg04810.html > > > > additionally i was able to unit test lucene with the one line change > > that i suggested with success, although i have not looked into how > > thorough the unit tests are for cases like this. > > > > the diff follows from a cvs snapshot from yesterday (note the added > > break;): > > *** RangeQuery.java Sat Nov 9 09:54:05 2002 > > --- RangeQuery.java.old Sat Nov 9 09:53:37 2002 > > *************** > > *** 164,170 **** > > TermQuery tq = new > > TermQuery(term); // found a match > > tq.setBoost(boost); // set > > the boost > > q.add(tq, false, false); // add > > to q > > - break; //ADDED! > > } > > } > > else > > --- 164,169 ---- > > > > > > i also pondered the ramifications of such a change, and have a few > > thoughts. it appears that this is successful because it eliminates the > > massive overhead of the byte[] built by the TermScorer when there are > > thousands of terms, but a side-effect may be that it will not accurately > > return a valid score. i have yet to test this, and my understanding of > > the code is still very limited. although i do not have a firm grasp of > > what is involved in scoring, is there not a possibility to score based > > on the number of results matched for this particular field as opposed to > > the current implementation. > > > > any thoughts? > > > > as i look through the code some more i will offer my thoughts on a > > possible reimplementation of RangeQuery to alleviate the overhead when > > there are thousands of terms as opposed to this simple one line change > > which may have hidden side-effects. > > > > i can also send a copy of some simple tests to show how to create this > > situation with profiling results if that would be helpful. > > > > > > thanks > > alex > > > > > > > > On Fri, 2002-11-08 at 17:40, Alex Winston wrote: > > > >>actually i was mistaken, i thought the tests ran successfully but after > >>looking again i merely got a BUILD SUCCESSFUL, apparently lucenes build > >>cannot find JUnitTask out of the box with ant1.5.1. i have not had any > >>time to work through the problem. i will look into it tomorrow, if you > >>have any thoughts in the meantime let me know. > >> > >>thanks > >>alex > >> > >> > >> > >>On Fri, 2002-11-08 at 16:46, Otis Gospodnetic wrote: > >> > >>>Hello, > >>> > >>>Did you say that you run 'ant test-unit' and that all tests still pass? > >>>If so, could you attach a cvs diff -ucN RangeQuery.java? > >>> > >>>Thanks, > >>>Otis > >>> > >>> > >>>--- Alex Winston <[EMAIL PROTECTED]> wrote: > >>> > >>>>apologizes for replying to myself, but another nice side-effect of > >>>>this > >>>>fix is that it virtually eliminates the potential for an > >>>>OutOfMemoryError, which was a problem i encountered on extremely > >>>>large > >>>>fields, over 10000 terms, while i was profiling the RangeQuery class. > >>>> > >>>>i can get into specifics if need be, any thoughts? > >>>> > >>>>alex > >>>> > >>>> > >>>> On Fri, 2002-11-08 at 15:54, Alex Winston wrote: > >>>> > >>>>>thanks for the reply, my apologizes for not explaining myself very > >>>>>clearly, it has been a long day. > >>>>> > >>>>>you expressed exactly our situation, unfortunately this is not an > >>>> > >>>>option > >>>> > >>>>>because we want to have multiple ranges for each document as well, > >>>>>there is a possible extension of what you suggested but that is a > >>>> > >>>>last > >>>> > >>>>>resort. kinda crazy i know, but you have to meet requirements :). > >>>>> > >>>>>but i also had a thought while i was looking through the lucene > >>>> > >>>>code, > >>>> > >>>>>and any comments are welcome. > >>>>> > >>>>>i may be very mistaken because it has been a long day but if you > >>>> > >>>>look at > >>>> > >>>>>the current cvs version of RangeQuery it appears that even if a > >>>> > >>>>match is > >>>> > >>>>>found it will continue to iterate over terms within a field, and in > >>>> > >>>>my > >>>> > >>>>>case it is on the order of thousands. if i add a break after a > >>>> > >>>>match > >>>> > >>>>>has been found it appears as though the search is improved on avg > >>>> > >>>>an > >>>> > >>>>>order of magnitude, my math has left me so i cannot be theoretical > >>>> > >>>>at > >>>> > >>>>>the moment. i have unit tested the change on my side and on the > >>>> > >>>>lucene > >>>> > >>>>>side and it works. note: one hard example is that a query went > >>>> > >>>>from 20 > >>>> > >>>>>seconds to .5 seconds. any initial thoughts to if there is a case > >>>> > >>>>where > >>>> > >>>>>this would not work? > >>>>> > >>>>>beginning line 164: > >>>>>TermQuery tq = new TermQuery(term); // found a match > >>>>>tq.setBoost(boost); // set the boost > >>>>>q.add(tq, false, false); // add to q > >>>>>break; // ADDED! > >>>>> > >>>>> > >>>>>On Fri, 2002-11-08 at 15:09, Mike Barry wrote: > >>>>> > >>>>>>Alex, > >>>>>> > >>>>>>It is rather confusing. It sounds like you've indexed > >>>>>>a field that that can be between two values (let's say > >>>>>>E-J) and then when you have a search term such as G > >>>>>>you want the docs containing E-J (or A-H or F-K but not A-H > >>>>>>nor A-C nor J-Z) > >>>>>> > >>>>>>Just of the top of my head but could you index the upper and > >>>>>>lower bounds as separate fields then when you search do a > >>>>>>compound query: > >>>>>> > >>>>>> lower_bound:{ - search_term } AND upper_bound:{ search_term > >>>>> > >>>>- } > >>>> > >>>>>>just a thought. > >>>>>> > >>>>>>>-MikeB. > >>>>>> > >>>>>> > >>>>>>Alex Winston wrote: > >>>>>> > >>>>>> > >>>>>>>i was hoping that someone could briefly review my current > >>>>>> > >>>>solution to a > >>>> > >>>>>>>problem that we have encountered to see if anyone could suggest > >>>>>> > >>>>a > >>>> > >>>>>>>possible alternative, because as it stands we have pushed > >>>>>> > >>>>lucene past > >>>> > >>>>>>>its current limits. > >>>>>>> > >>>>>>>PROBLEM: > >>>>>>> > >>>>>>>we were wanting to represent a range of values for a particular > >>>>>> > >>>>field > >>>> > >>>>>>>that is searchable over a particular range. > >>>>>>> > >>>>>>>an example follows for clarification: > >>>>>>>we were wanting to store a range of chapters and verses of a > >>>>>> > >>>>book for a > >>>> > >>>>>>>particular document, and in turn search to see if a query range > >>>>>> > >>>>includes > >>>> > >>>>>>>the range that is represented in the index. > >>>>>>> > >>>>>>>if this is unclear please ask for clarification > >>>>>>> > >>>>>>>IMPRACTICAL SOLUTION: > >>>>>>> > >>>>>>>although this solution seems somewhat impractical it is all we > >>>>>> > >>>>could > >>>> > >>>>>>>come up with. > >>>>>>> > >>>>>>>our solution involved storing each possible range value within > >>>>>> > >>>>the term > >>>> > >>>>>>>which would allow for RangeQuerys to be performed on this > >>>>>> > >>>>particular > >>>> > >>>>>>>field. for very small ranges this seems somewhat practical > >>>>>> > >>>>after > >>>> > >>>>>>>profiling. although once the field ranges began to span > >>>>>> > >>>>multiple > >>>> > >>>>>>>chapters and verses, the search times became unreasonable > >>>>>> > >>>>because we > >>>> > >>>>>>>were storing thousands of entries for each representative > >>>>>> > >>>>range. > >>>> > >>>>>>>i can elaborate on anything that is unclear, > >>>>>>>but any thoughts on a possible alternative solution within > >>>>>> > >>>>lucene that > >>>> > >>>>>>>we overlooked would be extremely helpful. > >>>>>>> > >>>>>>> > >>>>>>>alex > >>>>>> > >>>>>> > >>>>>> > >>>>>>-- > >>>>>>To unsubscribe, e-mail: > >>>>> > >>>><mailto:lucene-user-unsubscribe@;jakarta.apache.org> > >>>> > >>>>>>For additional commands, e-mail: > >>>>> > >>>><mailto:lucene-user-help@;jakarta.apache.org> > >>>> > >>>>>> > >>>> > >>>>ATTACHMENT part 2 application/pgp-signature name=signature.asc > >>> > >>> > >>> > >>>__________________________________________________ > >>>Do you Yahoo!? > >>>U2 on LAUNCH - Exclusive greatest hits videos > >>>http://launch.yahoo.com/u2 > >>> > >>>-- > >>>To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@;jakarta.apache.org> > >>>For additional commands, e-mail: <mailto:lucene-user-help@;jakarta.apache.org> > >>> > >>> > > > > > > -- > To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@;jakarta.apache.org> > For additional commands, e-mail: <mailto:lucene-user-help@;jakarta.apache.org> -- Alex Winston <[EMAIL PROTECTED]> -- To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@;jakarta.apache.org> For additional commands, e-mail: <mailto:lucene-user-help@;jakarta.apache.org>
