somandal commented on issue #7870:
URL: https://github.com/apache/pinot/issues/7870#issuecomment-1189797096

   Hey @Jackie-Jiang @walterddr 
   
   As discussed over slack, we got the compression results for the actual table 
that ran into this forward index size bloat issue. I've updated the document in 
[this 
section](https://docs.google.com/document/d/1BWtNKvxL1Uaydni_BJCgWN8i9_WeSdgL3Ksh4IpY_K0/edit#heading=h.cq0je3xwcssi).
 The TL;DR is that for the actual table the compression savings are very 
minimal. I updated the 
[recommendations](https://docs.google.com/document/d/1BWtNKvxL1Uaydni_BJCgWN8i9_WeSdgL3Ksh4IpY_K0/edit#heading=h.b4ch3eh9yztq)
 to indicate that for now it does not make sense to try to solve this by 
compressing the data using any of the approaches.
   
   @siddharthteotia and I would like to keep this issue open to explore further 
ideas in the future or perhaps revisit compression with dictionary in case we 
find users who have sufficient repeatability in their data to benefit from 
compression.
   
   Also, as discussed over our call, there may be some use of implementing 
Approach 2 from the proposed approaches for the sake of speeding up the query 
rather than saving on storage costs (i.e. have a dictionary and store the 
forward index in raw format -> which can help avoid an additional dictionary 
lookup). I had started some work on Approach 2 and have an initial PR before we 
ran these compression experiments. My PR stores the data as raw + compressed in 
the forward index but creates a dictionary. I need to spend some time on seeing 
how best to divide up the PRs before submitting this to OSS. Just wanted to 
give a heads up.
   
   cc @siddharthteotia  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to