Unalian commented on PR #41114: URL: https://github.com/apache/doris/pull/41114#issuecomment-2370747500
> @Unalian > > 1. Earlier versions of Doris only applied the PFOR compression algorithm to the inverted lists of the inverted index. We plan to use the PFOR algorithm for position information and Zstd compression for dictionary information, which will further reduce the space usage of Doris indexes. > 2. Doris only utilizes column storage and inverted indexes, resulting in less space usage compared to Elasticsearch. Thank you for answering! 1. I see PFOR compression on .frq files from codes. However, .frq (and as your plan, .prx file )files are always tiny compared with .tis(see from clucene). Does PFOR compression algorithm on .frq/.prx have a big impact on the inverted index size? <img width="263" alt="image" src="https://github.com/user-attachments/assets/b0408c10-20d0-4e02-ac65-94cd30dd6938"> Or is there a big difference between the inverted index files' sizes generated by Clucene and that generated by Doris? 2. If we ignore all the data and only consider the size of the inverted index, I use Lucene to build an inverted index for the same data, and Doris uses less space. This makes me curious. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
