siddharthteotia opened a new pull request #6284: URL: https://github.com/apache/incubator-pinot/pull/6284
Adding support for text index without raw data. Our use cases are indexing huge blobs of text (STRING) data. We have seen the raw forward index size upto 3GB per segment for text data. As an example for few of our tables, supporting this mode of text index will lead upto 2TB storage space saving per colo. These tables use text index only in the filter clause. The raw values are neither projected nor used in the query in filter other than text_match clause. - Like other text index configs, the behavior can be enabled on a per index basis through table config - The actual raw value is used for building the text index. - For the forward index, a dummy value (that also user can provide) will be used. Default would be "null". - The forward index can be dictionary encoded (this part is not yet covered in the PR but will be addressed in the same PR) A potential follow up will be to not have the column at all and just have the index. However, this requires semantic changes in quite a few places in the code that assume that forward index is always there. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
