github-actions[bot] commented on code in PR #3412:
URL: https://github.com/apache/doris-website/pull/3412#discussion_r2931208864
##########
docs/sql-manual/sql-functions/aggregate-functions/entropy.md:
##########
@@ -0,0 +1,101 @@
+---
+{
+ "title": "ENTROPY",
+ "language": "en",
+ "description": "Calculate the Shannon entropy of all non-null values in
the specified column or expression."
+}
+---
+
+## Description
+
+Computes the Shannon entropy of all non-null values in the specified column or
expression.
+
+Entropy measures the uncertainty or randomness of a distribution. This
function builds an empirical frequency map of the input values and computes
entropy in bits using the base‑2 logarithm.
+
+The Shannon entropy is defined as:
+
+$
+Entropy(X) = -\sum_{i=1}^{k} p_i \log_2(p_i)
+$
+
+Where:
+
+- $k$ is the number of distinct non-null values
+- $p_i = \frac{\text{count}(x_i)}{\text{total non-null count}}$
+
+## Syntax
+
+```sql
+ENTROPY(<expr1> [, <expr2>, ... , <exprN>])
+```
+
+## Parameters
+
+| Parameter | Description |
+|----------|-------------|
+| `<expr1> [, <expr2>, ...]` | One or more expressions or columns. Supported
types: TinyInt, SmallInt, Integer, BigInt, LargeInt, Float, Double, Decimal,
String, IPv4/IPv6, Array, Map,Struct. When multiple expressions are provided,
their values are serialized together to form a single composite key, and
entropy is computed over the frequency distribution of these composite keys. |
+
+## Return Value
+
+Returns a DOUBLE representing the Shannon entropy in bits.
+
+- Returns NULL if all values are NULL or the input is empty.
+- Ignores NULL values during computation.
+
+## Examples
+
+```sql
+CREATE TABLE t1 (
+ id INT,
+ v INT
+) DISTRIBUTED BY HASH(id) BUCKETS 1
+PROPERTIES ("replication_num"="1");
+
+INSERT INTO t1 VALUES
+ (1, 1),
+ (2, 2),
+ (3, 2),
+ (4, NULL);
+```
+
+```sql
+SELECT entropy(v) FROM t1;
+```
+
+Distribution: `{1:1, 2:2}` $H = -\left(\frac{1}{3}\log_2\frac{1}{3} +
\frac{2}{3}\log_2\frac{2}{3}\right)=0.9183$
+
+```text
++--------------------+
+| entropy(x) |
++--------------------+
Review Comment:
**Bug: wrong column name in result header.**
The query is `SELECT entropy(v) FROM t1;` but the result table shows
`entropy(x)` as the column header. It should be `entropy(v)` to match the
actual query output:
```text
+--------------------+
| entropy(v) |
+--------------------+
| 0.9182958340544896 |
+--------------------+
```
##########
docs/sql-manual/sql-functions/aggregate-functions/entropy.md:
##########
@@ -0,0 +1,101 @@
+---
+{
+ "title": "ENTROPY",
+ "language": "en",
+ "description": "Calculate the Shannon entropy of all non-null values in
the specified column or expression."
+}
+---
+
+## Description
+
+Computes the Shannon entropy of all non-null values in the specified column or
expression.
+
+Entropy measures the uncertainty or randomness of a distribution. This
function builds an empirical frequency map of the input values and computes
entropy in bits using the base‑2 logarithm.
+
+The Shannon entropy is defined as:
+
+$
+Entropy(X) = -\sum_{i=1}^{k} p_i \log_2(p_i)
+$
+
+Where:
+
+- $k$ is the number of distinct non-null values
+- $p_i = \frac{\text{count}(x_i)}{\text{total non-null count}}$
+
+## Syntax
+
+```sql
+ENTROPY(<expr1> [, <expr2>, ... , <exprN>])
+```
+
+## Parameters
+
+| Parameter | Description |
+|----------|-------------|
+| `<expr1> [, <expr2>, ...]` | One or more expressions or columns. Supported
types: TinyInt, SmallInt, Integer, BigInt, LargeInt, Float, Double, Decimal,
String, IPv4/IPv6, Array, Map,Struct. When multiple expressions are provided,
their values are serialized together to form a single composite key, and
entropy is computed over the frequency distribution of these composite keys. |
+
+## Return Value
+
+Returns a DOUBLE representing the Shannon entropy in bits.
+
+- Returns NULL if all values are NULL or the input is empty.
+- Ignores NULL values during computation.
+
+## Examples
+
+```sql
+CREATE TABLE t1 (
+ id INT,
+ v INT
+) DISTRIBUTED BY HASH(id) BUCKETS 1
+PROPERTIES ("replication_num"="1");
+
+INSERT INTO t1 VALUES
+ (1, 1),
+ (2, 2),
+ (3, 2),
+ (4, NULL);
+```
+
+```sql
+SELECT entropy(v) FROM t1;
+```
+
+Distribution: `{1:1, 2:2}` $H = -\left(\frac{1}{3}\log_2\frac{1}{3} +
\frac{2}{3}\log_2\frac{2}{3}\right)=0.9183$
+
+```text
++--------------------+
+| entropy(x) |
++--------------------+
+| 0.9182958340544896 |
++--------------------+
+```
+
+```sql
+SELECT entropy(1);
+```
+
+Only one distinct value → entropy = 0
+
+```text
++------------+
+| entropy(1) |
++------------+
+| 0 |
++------------+
+```
+
+```sql
+SELECT entropy(NULL) FROM t1;
+```
+
+Returns NULL if all values are NULL or the input is empty.
+
+```sql
++---------------+
Review Comment:
**Bug: wrong code fence type for query output.**
This block uses ` ```sql ` but it contains query *output*, not a SQL
statement. It should use ` ```text ` instead, consistent with all other result
blocks in this file and with the zh-CN counterpart.
##########
docs/sql-manual/sql-functions/aggregate-functions/entropy.md:
##########
@@ -0,0 +1,101 @@
+---
+{
+ "title": "ENTROPY",
+ "language": "en",
+ "description": "Calculate the Shannon entropy of all non-null values in
the specified column or expression."
+}
+---
+
+## Description
+
+Computes the Shannon entropy of all non-null values in the specified column or
expression.
+
+Entropy measures the uncertainty or randomness of a distribution. This
function builds an empirical frequency map of the input values and computes
entropy in bits using the base‑2 logarithm.
+
+The Shannon entropy is defined as:
+
+$
+Entropy(X) = -\sum_{i=1}^{k} p_i \log_2(p_i)
+$
+
+Where:
+
+- $k$ is the number of distinct non-null values
+- $p_i = \frac{\text{count}(x_i)}{\text{total non-null count}}$
+
+## Syntax
+
+```sql
+ENTROPY(<expr1> [, <expr2>, ... , <exprN>])
+```
+
+## Parameters
+
+| Parameter | Description |
+|----------|-------------|
+| `<expr1> [, <expr2>, ...]` | One or more expressions or columns. Supported
types: TinyInt, SmallInt, Integer, BigInt, LargeInt, Float, Double, Decimal,
String, IPv4/IPv6, Array, Map,Struct. When multiple expressions are provided,
their values are serialized together to form a single composite key, and
entropy is computed over the frequency distribution of these composite keys. |
+
Review Comment:
**Minor: missing space before `Struct`.**
`Array, Map,Struct` should be `Array, Map, Struct` (space after the comma).
##########
i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/entropy.md:
##########
@@ -0,0 +1,103 @@
+---
+{
+ "title": "ENTROPY",
+ "language": "zh-CN",
+ "description": "计算指定列或表达式中所有非 NULL 值的香农熵(Shannon Entropy)。"
+}
+---
+
+## 描述
+
+计算指定列或表达式中所有非 NULL 值的香农熵(Shannon Entropy)。
+
+熵用于衡量分布的不确定性或随机性。该函数会基于输入值构建经验频率分布,并使用以 2 为底的对数计算熵,单位为 比特(bits)。
+
+香农熵的定义如下:
+
+$
+Entropy(X) = -\sum_{i=1}^{k} p_i \log_2(p_i)
+$
+
+其中:
+
+- $k$ 为非 NULL 的不同值的数量
+- $p_i = \frac{x_i的数量}{\text{所有非null值数量}}$
+
+## 语法
+
+```sql
+ENTROPY(<expr1> [, <expr2>, ... , <exprN>])
+```
+
+## 参数
+
+| 参数 | 说明 |
+|------|------|
+| `<expr1> [, <expr2>, ...]` |
一个或多个表达式或列。支持的类型包括:TinyInt、SmallInt、Integer、BigInt、LargeInt、Float、Double、Decimal、String、IPv4/IPv6、Array、Map、Struct
等。当提供多列时,每行的多个值会被序列化为一个复合键,并基于复合键的频率分布计算熵。 |
+
+## 返回值
+
+返回一个 DOUBLE,表示以比特为单位的香农熵。
+
+- 如果所有值均为 NULL 或输入为空,则返回 NULL。
+- 计算过程中会忽略 NULL 值。
+
+## 举例
+
+```sql
+CREATE TABLE t1 (
+ id INT,
+ v INT
+) DISTRIBUTED BY HASH(id) BUCKETS 1
+PROPERTIES ("replication_num"="1");
+
+INSERT INTO t1 VALUES
+ (1, 1),
+ (2, 2),
+ (3, 2),
+ (4, NULL);
+```
+
+```sql
+SELECT entropy(v) FROM t1;
+```
+
+频率分布:`{1:1, 2:2}`
+
+熵的计算: $H = -\left(\frac{1}{3}\log_2\frac{1}{3} +
\frac{2}{3}\log_2\frac{2}{3}\right)=0.9183$
+
+```text
++--------------------+
+| entropy(x) |
++--------------------+
Review Comment:
**Bug: wrong column name in result header (same as English doc).**
The query is `SELECT entropy(v) FROM t1;` but the result table shows
`entropy(x)`. It should be `entropy(v)`:
```text
+--------------------+
| entropy(v) |
+--------------------+
| 0.9182958340544896 |
+--------------------+
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]