siddharthteotia opened a new pull request #6284:
URL: https://github.com/apache/incubator-pinot/pull/6284


   Adding support for text index without raw data.
   
   Our use cases are indexing huge blobs of text (STRING) data. We have seen 
the raw forward index size upto 3GB per segment for text data. As an example 
for few of our tables, supporting this mode of text index will lead upto 2TB 
storage space saving per colo. These tables use text index only in the filter 
clause. The raw values are neither projected nor used in the query in filter 
other than text_match clause.
   
   - Like other text index configs, the behavior can be enabled on a per index 
basis through table config
   - The actual raw value is used for building the text index.
   - For the forward index, a dummy value (that also user can provide) will be 
used.  Default would be "null".
   - The forward index can be dictionary encoded (this part is not yet covered 
in the PR but will be addressed in the same PR)
   
   A potential follow up will be to not have the column at all and just have 
the index. However, this requires semantic changes in quite a few places in the 
code that assume that forward index is always there. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to