>Date: Wed, 10 Sep 2003 14:11:30 -0700 >From: Doug Cutting <[EMAIL PROTECTED]> >Subject: Re: Issue with Similarity and negative numbers > >-1 > >This would be an incompatible change that could break lots of folks. >Also, the range of values that you represent in your one-byte float >format is less useful to most Lucene applications. Negative values are >rarely used, and normalizing values to be between 0 and 1 is not always >easy.
I had taken care to make sure that the change was *compatible*. :-( What about the change would break lots of folks? My rational was that if the mapping for positive bytes to postive floats and visa-versa was unchanged the only way to store negative bytes in the index would be to use a negative float as a field or document boost. >Can you please describe more about what you're trying to achieve? There >are lots of other ways of efficiently implementing date-sorted search >results. For example, you can add the documents to the index in >chronological order, then use a HitFilter which collects the documents >with the highest document id. That is very efficient and requires no >changes to Lucene. I have a highly dynamic index of news headlines where the incoming headlines are often not in cronological order. To make things worse changes must to be made to headlines post-indexing without affecting their chronological order. I overide the default Similarity instance to disable field normalization and set the date-sorting 'hint' using Document.setBoost(float) Also using the score I can implementing a forward / back paging as the score is persistent and the document ids are not. I do this my using a org.apache.lucene.search.Filter and accessing the scores through IndexReader.norms(String field) and only setting the BitSet when score is in required range. A previous solution used the HitFilter and document id solution that you suggested. Alas it did not work 100% correctly. Is there a FAQ entry about common date-sorting methods? >Cheers, > >Doug > Many Thanks for a greate product! Nick >Nick Smith wrote: >> Hi Luceners! >> >> I am misusing the document score for date sorting (I display news >> headlines in a chronological list). >> >> As the document score is ultimately encoded as a byte the maximum >> possible number of values is 256 minus the special value of 0 >> (document not found). >> >> In the current implementation; all negative float values get >> rounded up to zero by Similarity.floatToByte() and the method >> Similarity.byteToFloat() returns only values in the range of >> 1 to 127 values that are greater than the decode for the >> next lower byte value. >> >> i.e. >> Similarity.byteToFloat(byteVal+1) > Similarity.byteToFloat(byteVal) >> >> For my application having 255 possible scores from searches was better >> than 127 so.... >> >> I have patched the Similarity class to encode negative floats into >> the negative byte values and to decode the negative byte values back >> into negative floats. >> >> The encoding of the positive values are unchanged by this patch. >> >> Could this version please be checked into CVS by someone with commit >> rights? Or is there are a more formal procedure to submitting patches, >> say via the Bugzilla? >> >> Many Thanks, >> >> Nick Smith > > >--------------------------------------------------------------------- >To unsubscribe, e-mail: [EMAIL PROTECTED] >For additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
