alexch2000 opened a new issue, #14984:
URL: https://github.com/apache/pinot/issues/14984

   Hey team, we’re building a funnel observability solution using Pinot, where 
we have a large table with more than 200B rows and around 300+ columns. Of 
these, only 20–30 columns are commonly used, while the rest are almost always 
NULL (sparse). Based on my understanding of Pinot’s architecture, each column’s 
forward index will store some default value for every row, even if the column 
is NULL most of the time.
   What is your recommendation for dealing with a table that has hundreds of 
mostly NULL columns, especially in terms of segment sizing, ingestion 
performance, and ongoing query efficiency?
   
   Are there any roadmap items or upcoming features aimed at optimizing storage 
or performance for extremely sparse columns (e.g., skipping or compressing 
columns with mostly NULL values)?
   
   Are there any existing best practices or configurations we could apply to 
make this more efficient (e.g., using **defaultNullValue**, dictionary vs. raw 
encoding, or segment-level pruning)?
   
   > Pinot always stores column values in a [forward 
index](https://docs.pinot.apache.org/basics/indexing/forward-index). Forward 
index never stores null values but have to store a value for each row. 
Therefore independent of the null handling configuration, Pinot always stores a 
default value for nulls rows in the forward index. The default value used in a 
column can be specified in the 
[schema](https://docs.pinot.apache.org/configuration-reference/schema) 
configuration by setting the defaultNullValue field spec. The defaultNullValue 
depends on the type of data.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to