The posting list is compressed using a specialised technique aimed at pure numbers. Currently the codec uses a variant of Patched Frame of Reference coding to perform this compression.
A good survey of such techniques can be found in the good IR books (https://mitpress.mit.edu/books/information-retrieval, http://www.amazon.com/Managing-Gigabytes-Compressing-Multimedia-Information/dp/1558605703, http://nlp.stanford.edu/IR-book/) as well as this paper http://eprints.gla.ac.uk/93572/1/93572.pdf. Interestingly, there are potentially some wins in finding better integer codings (and one of my personal projects is aimed at doing exactly this), but I doubt LZ4 compressing the posting list would help all that much. Hope this helps On Mon, Mar 28, 2016, at 10:51 AM, Vishwas Jain wrote: > Hello , > > We are trying to implement better compression techniques in > lucene54 codec of Apache Lucene. Currently there is no such compression > for > posting lists in lucene54 codec but LZ4 compression technique is used for > stored fields. Does anyone know why there is no compression technique for > postings lists? and what are the possible compression that would benefit > if > implemented? > > Thanks --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org