Unalian commented on PR #41114:
URL: https://github.com/apache/doris/pull/41114#issuecomment-2370747500

   > @Unalian
   > 
   > 1. Earlier versions of Doris only applied the PFOR compression algorithm 
to the inverted lists of the inverted index.  We plan to use the PFOR algorithm 
for position information and Zstd compression for dictionary information, which 
will further reduce the space usage of Doris indexes.
   > 2. Doris only utilizes column storage and inverted indexes, resulting in 
less space usage compared to Elasticsearch.
   
   Thank you for answering! 
   1. I see PFOR compression on .frq files from codes. However, .frq (and as 
your plan, .prx file )files are always tiny compared with .tis(see from 
clucene).  Does PFOR compression algorithm on .frq/.prx have a big impact on 
the inverted index size? 
   <img width="263" alt="image" 
src="https://github.com/user-attachments/assets/b0408c10-20d0-4e02-ac65-94cd30dd6938";>
   Or is there a big difference between the inverted index files' sizes 
generated by Clucene and that generated by Doris?
   
   2. If we ignore all the data and only consider the size of the inverted 
index, I use Lucene to build an inverted index for the same data, and Doris 
uses less space. This makes me curious.
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to