paul-rogers commented on issue #7124: URL: https://github.com/apache/druid/issues/7124#issuecomment-893938762
+1 for this feature. As noted, without this, I cannot really tell how much space a column consumes and whether it is worth the cost. I suppose I could infer this by creating a new table without the column, and comparing the difference, but doing so is clearly a bit of a hassle. The number returned should account for all the space dedicated to the column, including any dictionary overhead and run-length encoding or whatever. Would be wonderful to have separate numbers for in-memory and on-disk, if they are vastly different for some reason. The key bit we want to know is the cost of column X relative to the overall table size. So, as long as the in-memory and on-disk sizes are proportional, having one size is good enough (if it is accurate.) A good check would be that the sum of column sizes (per segment) should more-or-less equal the segment size, aside from any segment overhead. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
