clintropolis opened a new issue #7124: rework segment metadata query "size" 
analysis
URL: https://github.com/apache/incubator-druid/issues/7124
 
 
   # Motivation
   
   Currently segment metadata query has a `size` analysis type that the 
documentation describes as "estimated byte size for the segment columns if they 
were stored in a flat format" and "estimated total segment byte size in if it 
was stored in a flat format", but this doesn't really have any practical value 
as far as I can tell. I think this value is confusing, and should be replaced 
with a `size` value that represents the actual segment and column sizes for 
mapped segments, or the estimated size in memory for incremental segments. 
   
   This allows size analysis to be useful for finding which columns are the 
heavy hitters in terms of overall segment size, observe fluctuations in segment 
size over time, and could also be of aid in capacity planning.
   
   Alternatively, if this doesn't seem useful enough, or more trouble than it's 
worth, I would instead propose that the value just be removed completely 
because it's confusing and expensive to run.
   
   # Proposed changes
   
   Making the value meaningful likely doesn't take much effort. For mapped 
columns, it would just be preserving the byte size of the columns when 
initially loading segments and making it accessible through the `BaseColumn` or 
related interface (if it isn't already and I just missed it). For incremental 
index, it would involve modifying the size functions to report the estimated 
size in memory of the values where necessary. No surface level changes would be 
necessary for this approach, but we would need to call out in the release notes 
that the meanings of values has changed and update documentation accordingly.
   
   # Rationale
   
   We already need to know this size information to properly map columns from 
the 'smoosh' file, so preserving it and offering it up for segment metadata 
queries should be rather straightforward.
   
   # Operational impact
   
   Size analysis as is looks rather expensive, if we switch to column "in 
segment" size, then this operation can become constant for mapped segments 
resulting in cheaper overall segment metadata queries using this analysis on 
mapped segments. I expect little change for the case of incremental indexes.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to