FrankChen021 opened a new issue #10584: URL: https://github.com/apache/druid/issues/10584
### Description We're planning to add a format function that turns a number into human readable format. ### Motivation One of our use cases is to use Druid as part of application monitoring system. Most of data stored in Druid are related to byte, such as memory size, disk size, network IO byte. In most cases, theses values are very large especially after aggregation, so we have to convert these data into human readable format to make it easy to recognize when querying. Druid does not provide such a format function, so now we have to do it in the following ways: 1. a middleware service does the conversion 2. front-end projects do the conversion The downside of above solutions is obvious -- we have to write extra code to do the conversion along side the Druid SQL itself, or we have to tell our frontend engineers that they have to do the conversion on some fields. So if this format ability is shipped with Druid itself, I think it could simplify the whole development and deployment of our system. ### Requirement These requirements are based on what we have implemented in our project. 1. Support to format number in [IEC](https://en.wikipedia.org/wiki/Binary_prefix) format(binary format, such as KiB, MiB, GiB) 2. Support to format number in SI format(decimal format, such as KB, MB, GB) 2. Support to format value in simplified IEC or SI format (such as K, M, G) 4. Support customer precision from 0 to 3 5. flexible format When displaying data in a table, it's more readable to format the value as '5.00 GB', a space between the number and the unit, so that it is easy to recognize the unit. when displaying data in a card(a kind of frontend component, usually displays text in a large size), it's better to show it as '5.17G' in abbreviated unit format, no space between the number and unit ### Proposed Change MySQL has provided a similar function named [format_bytes](https://docs.oracle.com/cd/E17952_01/mysql-8.0-en/sys-format-bytes.html), which satisfies the requirement 1 above. We're planning to add a SQL function named as `size_format`(same style as `string_format`) to serve the needs above, and some alternatives we come up with are listed below #### Alternative 1 ##### Prototype ``` size_format(expression, [format], [unit system], [precision]) ``` - unit system: format the input in which unit system. Optional value is `bin`(standing for binary, IEC format) or `dec`(decimal, SI format). Default is `bin` - precision: precision of value, ranging from 0 to 3. Default is 2 - format: indicate if there's space between number and value, and if there's a `B`(indicating byte) suffix ##### Examples ``` size_format(1024) # outputs 1.00Ki size_format(1024, 'B') #outputs 1.00KiB size_format(1024, ' B') #outputs 1.00 KiB size_format(1024, ' B', 'bin', 0) #outputs 1 KiB size_format(1024, ' B', 'bin', 1) #outputs 1.0 KiB size_format(1024, ' B', 'dec', 1) #outputs 1.02 KB size_format(1024, ' B', 'dec', 1) #outputs 1.02 KB ``` #### Alternative 2 From the view of implementation, the `format` argument of alternative 1 is not easy to control. And What if user does not provide `B` as suffix instead of other strings ? So there are rules imposed on this argument that might make it a little bit complex both for users and developers. ##### Prototype ``` size_format(expression, [unit], [precision], [has_space]) ``` - unit: unlike the definition in alternative 1, the unit here divides into 4 categories: `binary`, `binary_byte`, `decimal`, `decimal_byte`. The default is `binary_byte`. - precision: the same as alternative 1, ranging from 0 to 3, default is 2. - has_space: whether or not there's a space between the number and the unit. Default is true ##### Examples ``` size_format(1024) # outputs 1.00 KiB size_format(1024, 'binary') #outputs 1.00 Ki size_format(1024, 'binary_byte') #outputs 1.00 KiB size_format(1024, 'binary_byte', 1) #outputs 1.0 KiB size_format(1024, 'binary_byte', 1, false) #outputs 1.0KiB size_format(1024, 'decimal') #outputs 1.02 K size_format(1024, 'decimal_byte') #outputs 1.02 KB size_format(1024, 'decimal_byte', 3, false) #outputs 1.023 KB ``` Note: The use cases of `binary` in practice I think are rare, we don't plan to implement it. Currently, we tend to do it in the way of alternative 2. Want to hear what you think @gianm @jihoonson @asdf2014 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
