This is an automated email from the ASF dual-hosted git repository.
yiguolei pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new 11221e150ce [doc] json_hash (#2974)
11221e150ce is described below
commit 11221e150ce12eba05d2f4806e860c65f6ec6725
Author: Mryange <[email protected]>
AuthorDate: Thu Oct 23 15:01:40 2025 +0800
[doc] json_hash (#2974)
## Versions
- [ ] dev
- [ ] 3.0
- [ ] 2.1
- [ ] 2.0
## Languages
- [ ] Chinese
- [ ] English
## Docs Checklist
- [ ] Checked by AI
- [ ] Test Cases Built
---
.../scalar-functions/json-functions/json-hash.md | 171 +++++++++++++++++++++
.../scalar-functions/json-functions/json-hash.md | 171 +++++++++++++++++++++
sidebars.json | 1 +
3 files changed, 343 insertions(+)
diff --git
a/docs/sql-manual/sql-functions/scalar-functions/json-functions/json-hash.md
b/docs/sql-manual/sql-functions/scalar-functions/json-functions/json-hash.md
new file mode 100644
index 00000000000..4510960adff
--- /dev/null
+++ b/docs/sql-manual/sql-functions/scalar-functions/json-functions/json-hash.md
@@ -0,0 +1,171 @@
+---
+{
+ "title": "JSON_HASH",
+ "language": "en"
+}
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+## Description
+
+`JSON_HASH` calculates a hash value for a JSON object. This function accepts a
JSON type parameter and returns a BIGINT hash value.
+
+When calculating the hash value of a JSON object, the function sorts the keys
of the JSON object before calculation, ensuring that JSON objects with
identical content but different key orders produce the same hash value.
+
+## Syntax
+
+```sql
+JSON_HASH(json_value)
+```
+
+## Alias
+
+`JSONB_HASH`
+
+## Parameters
+
+**json_value** - The JSON value for which to calculate a hash value. Must be
of JSON type.
+
+## Return Value
+
+Returns a BIGINT hash value.
+
+When the input is NULL, the function returns NULL.
+
+## Usage
+
+Since the JSON standard specifies that key-value pairs in JSON objects are
unordered, to ensure consistent identification of JSON objects with the same
content across different systems, the `JSON_HASH` function sorts the key-value
pairs before calculating the hash value, similar to calling the
`SORT_JSON_OBJECT_KEYS` function.
+
+Additionally, for duplicate keys in JSON objects, although Doris allows them
to exist, the hash calculation only considers the first occurring key-value
pair, which better matches real-world application scenarios.
+
+## Examples
+
+1. Basic hash calculation
+```sql
+SELECT json_hash(cast('123' as json));
+```
+```text
++--------------------------------+
+| json_hash(cast('123' as json)) |
++--------------------------------+
+| 5279066513252500087 |
++--------------------------------+
+```
+
+2. Verifying alias function
+```sql
+SELECT json_hash(cast('123' as json)), jsonb_hash(cast('123' as json));
+```
+```text
++--------------------------------+---------------------------------+
+| json_hash(cast('123' as json)) | jsonb_hash(cast('123' as json)) |
++--------------------------------+---------------------------------+
+| 5279066513252500087 | 5279066513252500087 |
++--------------------------------+---------------------------------+
+```
+As shown, `json_hash` and `jsonb_hash` functions produce identical hash values
for the same input, confirming they are equivalent alias functions.
+
+3. Key sorting verification
+```sql
+SELECT
+ json_hash(cast('{"a":123, "b":456}' as json)),
+ json_hash(cast('{"b":456, "a":123}' as json));
+```
+```text
++-----------------------------------------------+-----------------------------------------------+
+| json_hash(cast('{"a":123, "b":456}' as json)) | json_hash(cast('{"b":456,
"a":123}' as json)) |
++-----------------------------------------------+-----------------------------------------------+
+| 82454694884268544 |
82454694884268544 |
++-----------------------------------------------+-----------------------------------------------+
+```
+The `json_hash` function generates the same hash value regardless of key order
because it sorts the keys before calculating the hash value.
+
+4. Handling duplicate keys
+```sql
+SELECT
+ json_hash(cast('{"a":123}' as json)),
+ json_hash(cast('{"a":456}' as json)),
+ json_hash(cast('{"a":123, "a":456}' as json));
+```
+```text
++--------------------------------------+--------------------------------------+-----------------------------------------------+
+| json_hash(cast('{"a":123}' as json)) | json_hash(cast('{"a":456}' as json))
| json_hash(cast('{"a":123, "a":456}' as json)) |
++--------------------------------------+--------------------------------------+-----------------------------------------------+
+| -7416836614234106918 | -3126362109586887012
| -7416836614234106918 |
++--------------------------------------+--------------------------------------+-----------------------------------------------+
+```
+When a JSON object contains duplicate keys (`{"a":123, "a":456}`), the
`json_hash` function only considers the first key-value pair for hash
calculation. As shown, the hash value of the JSON object with duplicate keys
matches that of the object containing only the first key-value pair `{"a":123}`.
+
+5. Different number type handling
+```sql
+SELECT
+ json_hash(to_json(cast('123' as int))),
+ json_hash(to_json(cast('123' as tinyint)));
+```
+```text
++----------------------------------------+--------------------------------------------+
+| json_hash(to_json(cast('123' as int))) | json_hash(to_json(cast('123' as
tinyint))) |
++----------------------------------------+--------------------------------------------+
+| 7882559133986259892 |
5279066513252500087 |
++----------------------------------------+--------------------------------------------+
+```
+The same numeric value 123, when stored in JSON with different types (int and
tinyint), produces different hash values. This is because Doris's JSON
implementation preserves type information, and the hash calculation considers
these type differences.
+
+6. Using normalize_json_numbers_to_double for uniform type
+```sql
+SELECT
+ json_hash(normalize_json_numbers_to_double(to_json(cast('123' as int)))),
+ json_hash(normalize_json_numbers_to_double(to_json(cast('123' as
tinyint))));
+```
+```text
++--------------------------------------------------------------------------+------------------------------------------------------------------------------+
+| json_hash(normalize_json_numbers_to_double(to_json(cast('123' as int)))) |
json_hash(normalize_json_numbers_to_double(to_json(cast('123' as tinyint)))) |
++--------------------------------------------------------------------------+------------------------------------------------------------------------------+
+| 4028523408277343359 |
4028523408277343359 |
++--------------------------------------------------------------------------+------------------------------------------------------------------------------+
+```
+This example demonstrates how to solve the above issue: use the
`normalize_json_numbers_to_double` function to first convert all numeric values
to double precision floating-point type, then calculate the hash value. This
ensures consistent hash values regardless of the original numeric type.
+
+7. Handling NULL values
+```sql
+SELECT json_hash(null);
+```
+```text
++-----------------+
+| json_hash(null) |
++-----------------+
+| NULL |
++-----------------+
+```
+
+## Notes
+
+1. The `JSON_HASH` function has an alias `JSONB_HASH`, both functions have
identical functionality.
+
+2. This function sorts the keys of JSON objects before calculating hash
values, similar to calling the `SORT_JSON_OBJECT_KEYS` function.
+
+3. For duplicate keys in JSON objects, the function only considers the first
occurring key-value pair for hash calculation.
+
+4. Because Doris's JSON can store numbers in different types (int, tinyint,
bigint, float, double, decimal), the same numeric value with different types
may produce different hash values. If consistency is required, you can use the
`NORMALIZE_JSON_NUMBERS_TO_DOUBLE` function to convert all numeric values to a
uniform type before calculating hash values.
+
+5. When JSON objects are created through text parsing (such as using `CAST` to
convert a string to JSON), Doris automatically selects the appropriate numeric
type for storage, so typically you don't need to worry about numeric type
inconsistency issues.
+
+6. Note that if you don't manually convert "123" to a JSON object using
`cast/to_json`, but instead use text conversion (parsing JSON objects from
strings), Doris will only store "123" as a tinyint type JSON object, and won't
have a situation where "123" is stored as both int type and tinyint type.
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/scalar-functions/json-functions/json-hash.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/scalar-functions/json-functions/json-hash.md
new file mode 100644
index 00000000000..6c6fcb7db7c
--- /dev/null
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/scalar-functions/json-functions/json-hash.md
@@ -0,0 +1,171 @@
+---
+{
+ "title": "JSON_HASH",
+ "language": "zh-CN"
+}
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+## 描述
+
+`JSON_HASH` 函数用于计算一个 JSON 对象的哈希值。该函数接受一个 JSON 类型的参数,并返回一个 BIGINT 类型的哈希值。
+
+在计算 JSON 对象的哈希值时,函数会对 JSON 对象的键进行排序后再计算哈希值,这样可以确保相同内容但键顺序不同的 JSON 对象会产生相同的哈希值。
+
+## 语法
+
+```sql
+JSON_HASH(json_value)
+```
+
+## 别名
+
+`JSONB_HASH`
+
+## 参数
+
+**json_value** - 需要计算哈希值的 JSON 值。必须是 JSON 类型。
+
+## 返回值
+
+返回一个 BIGINT 类型的哈希值。
+
+当输入为 NULL 时,函数返回 NULL。
+
+## 用途
+
+由于 JSON 标准规定 JSON 对象的键值对是无序的,为了确保不同系统间传递 JSON 值时能够一致地识别相同内容的 JSON
对象,`JSON_HASH` 函数会在计算哈希值前对 JSON 对象的键值对进行排序,类似于调用 `SORT_JSON_OBJECT_KEYS` 函数。
+
+此外,对于 JSON 对象中的重复键,尽管 Doris 允许这种情况存在,但计算哈希值时会只考虑第一个出现的键值对,与实际应用场景更加匹配。
+
+## 示例
+
+1. 基本哈希值计算
+```sql
+SELECT json_hash(cast('123' as json));
+```
+```text
++--------------------------------+
+| json_hash(cast('123' as json)) |
++--------------------------------+
+| 5279066513252500087 |
++--------------------------------+
+```
+
+2. 验证别名函数
+```sql
+SELECT json_hash(cast('123' as json)), jsonb_hash(cast('123' as json));
+```
+```text
++--------------------------------+---------------------------------+
+| json_hash(cast('123' as json)) | jsonb_hash(cast('123' as json)) |
++--------------------------------+---------------------------------+
+| 5279066513252500087 | 5279066513252500087 |
++--------------------------------+---------------------------------+
+```
+可以看到 `json_hash` 和 `jsonb_hash` 两个函数对相同输入产生相同的哈希值,它们是完全等价的别名函数。
+
+3. 键排序验证
+```sql
+SELECT
+ json_hash(cast('{"a":123, "b":456}' as json)),
+ json_hash(cast('{"b":456, "a":123}' as json));
+```
+```text
++-----------------------------------------------+-----------------------------------------------+
+| json_hash(cast('{"a":123, "b":456}' as json)) | json_hash(cast('{"b":456,
"a":123}' as json)) |
++-----------------------------------------------+-----------------------------------------------+
+| 82454694884268544 |
82454694884268544 |
++-----------------------------------------------+-----------------------------------------------+
+```
+`json_hash` 函数都会生成相同的哈希值。这是因为函数在计算哈希值前会先对键进行排序。
+
+4. 处理重复键
+```sql
+SELECT
+ json_hash(cast('{"a":123}' as json)),
+ json_hash(cast('{"a":456}' as json)),
+ json_hash(cast('{"a":123, "a":456}' as json));
+```
+```text
++--------------------------------------+--------------------------------------+-----------------------------------------------+
+| json_hash(cast('{"a":123}' as json)) | json_hash(cast('{"a":456}' as json))
| json_hash(cast('{"a":123, "a":456}' as json)) |
++--------------------------------------+--------------------------------------+-----------------------------------------------+
+| -7416836614234106918 | -3126362109586887012
| -7416836614234106918 |
++--------------------------------------+--------------------------------------+-----------------------------------------------+
+```
+当 JSON 对象包含重复键时(`{"a":123, "a":456}`),`json_hash`
函数只考虑第一个出现的键值对进行哈希计算。可以看到含重复键的 JSON 对象的哈希值与只包含第一个键值对 `{"a":123}` 的哈希值相同。
+
+5. 不同数值类型的处理
+```sql
+SELECT
+ json_hash(to_json(cast('123' as int))),
+ json_hash(to_json(cast('123' as tinyint)));
+```
+```text
++----------------------------------------+--------------------------------------------+
+| json_hash(to_json(cast('123' as int))) | json_hash(to_json(cast('123' as
tinyint))) |
++----------------------------------------+--------------------------------------------+
+| 7882559133986259892 |
5279066513252500087 |
++----------------------------------------+--------------------------------------------+
+```
+相同的数值 123,当以不同类型(int 和 tinyint)存储在 JSON 中时,会产生不同的哈希值。这是因为 Doris 的 JSON
实现保留了数据类型信息,而哈希计算会考虑这些类型差异。
+
+6. 使用 normalize_json_numbers_to_double 统一数值类型
+```sql
+SELECT
+ json_hash(normalize_json_numbers_to_double(to_json(cast('123' as int)))),
+ json_hash(normalize_json_numbers_to_double(to_json(cast('123' as
tinyint))));
+```
+```text
++--------------------------------------------------------------------------+------------------------------------------------------------------------------+
+| json_hash(normalize_json_numbers_to_double(to_json(cast('123' as int)))) |
json_hash(normalize_json_numbers_to_double(to_json(cast('123' as tinyint)))) |
++--------------------------------------------------------------------------+------------------------------------------------------------------------------+
+| 4028523408277343359 |
4028523408277343359 |
++--------------------------------------------------------------------------+------------------------------------------------------------------------------+
+```
+这个例子演示了如何解决上述问题:使用 `normalize_json_numbers_to_double`
函数先将所有数值转换为双精度浮点数类型,然后再计算哈希值。这样,不管原始数值是什么类型,转换后都会得到相同的哈希值,确保了一致性。
+
+7. 处理 NULL 值
+```sql
+SELECT json_hash(null);
+```
+```text
++-----------------+
+| json_hash(null) |
++-----------------+
+| NULL |
++-----------------+
+```
+
+## 注意事项
+
+1. `JSON_HASH` 函数有一个别名 `JSONB_HASH`,两者功能完全相同。
+
+2. 此函数在计算哈希值前会对 JSON 对象的键进行排序,类似于调用 `SORT_JSON_OBJECT_KEYS` 函数。
+
+3. 对于 JSON 对象中的重复键,函数只会考虑第一个出现的键值对进行哈希值计算。
+
+4. 由于 Doris 的 JSON 中的数值可能以不同的存储类型(如
int、tinyint、bigint、float、double、decimal)存在,相同数值但不同类型可能会产生不同的哈希值。如果需要确保一致性,可以使用
`NORMALIZE_JSON_NUMBERS_TO_DOUBLE` 函数将所有数值转换为统一类型后再计算哈希值。
+
+5. 当通过文本解析方式(如使用 `CAST` 将字符串转为 JSON)创建 JSON 对象时,Doris
会自动选择合适的数值类型存储,通常情况下不需要担心数值类型不一致的问题。
+
+6. 需要注意的是,如果不是手动通过 `cast/to_json` 的方式转换成 JSON 对象,而是使用文本转换(从字符串解析 JSON 对象),那么
Doris 只会把 "123" 存储为一个 tinyint 类型的 JSON 对象,不会出现 "123" 既存储为 int 类型,又存储为 tinyint
类型的情况。
diff --git a/sidebars.json b/sidebars.json
index 3b77afb7e7f..91061e23cb0 100644
--- a/sidebars.json
+++ b/sidebars.json
@@ -1619,6 +1619,7 @@
"sql-manual/sql-functions/scalar-functions/json-functions/json-extract-isnull",
"sql-manual/sql-functions/scalar-functions/json-functions/json-extract-largeint",
"sql-manual/sql-functions/scalar-functions/json-functions/json-extract-string",
+
"sql-manual/sql-functions/scalar-functions/json-functions/json-hash",
"sql-manual/sql-functions/scalar-functions/json-functions/json-insert",
"sql-manual/sql-functions/scalar-functions/json-functions/json-keys",
"sql-manual/sql-functions/scalar-functions/json-functions/json-length",
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]