somandal commented on issue #7870: URL: https://github.com/apache/pinot/issues/7870#issuecomment-1189797096
Hey @Jackie-Jiang @walterddr As discussed over slack, we got the compression results for the actual table that ran into this forward index size bloat issue. I've updated the document in [this section](https://docs.google.com/document/d/1BWtNKvxL1Uaydni_BJCgWN8i9_WeSdgL3Ksh4IpY_K0/edit#heading=h.cq0je3xwcssi). The TL;DR is that for the actual table the compression savings are very minimal. I updated the [recommendations](https://docs.google.com/document/d/1BWtNKvxL1Uaydni_BJCgWN8i9_WeSdgL3Ksh4IpY_K0/edit#heading=h.b4ch3eh9yztq) to indicate that for now it does not make sense to try to solve this by compressing the data using any of the approaches. @siddharthteotia and I would like to keep this issue open to explore further ideas in the future or perhaps revisit compression with dictionary in case we find users who have sufficient repeatability in their data to benefit from compression. Also, as discussed over our call, there may be some use of implementing Approach 2 from the proposed approaches for the sake of speeding up the query rather than saving on storage costs (i.e. have a dictionary and store the forward index in raw format -> which can help avoid an additional dictionary lookup). I had started some work on Approach 2 and have an initial PR before we ran these compression experiments. My PR stores the data as raw + compressed in the forward index but creates a dictionary. I need to spend some time on seeing how best to divide up the PRs before submitting this to OSS. Just wanted to give a heads up. cc @siddharthteotia -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
