Hi, On Thu, Apr 18, 2013 at 3:46 PM, Gaurav Ranjan <gaurav.ranjan.i...@gmail.com> wrote: > I am a student and studying the functionality of Lucene for my project work. > The DocDelta example on this link is not clear > http://lucene.apache.org/core/4_2_0/core/org/apache/lucene/codecs/lucene40/Lucene40PostingsFormat.html?is-external=true > , > > Please explain the first part how we are getting 15,8,3 as the TermFreqs > for the example.
The term appears once in doc 7 and 3 times in doc 11. In real-world cases, freqs are very often equal to 1, so Lucene40PostingsFormat tries to use as little data as possible (one bit here) when the freq is 1. Here are the steps performed: 1. Raw doc IDs and freqs -> 7, 1, 11, 3 2. Delta-encoded doc IDs -> 7, 1, 4, 3 (11 - 7 = 4) 3. Multiply deltas by 2 -> 14, 1, 8, 3 4. When the frequency is 1, omit it and add one the the doc delta -> 15, 8, 3 To decode, just perform the steps in reverse order: 1. Encoded data -> 15, 8, 3 2. When the doc delta is even, it means that the frequency is omitted and equal to 1 -> 15, 1, 8, 3 3. Divide deltas by 2 -> 7, 1, 4, 3 4. Restore absolute doc IDs -> 7, 1, 11, 3 -- Adrien --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org