Re: [PR] Docs: Add docs for Spark SQL Iceberg transform functions (#13156) [iceberg]

via GitHub Tue, 22 Jul 2025 23:51:56 -0700


nastra commented on code in PR #13194:
URL: https://github.com/apache/iceberg/pull/13194#discussion_r2224579308



##########
docs/docs/spark-functions.md:
##########
@@ -0,0 +1,262 @@
+---
+title: "Functions"
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+## Spark SQL Functions for Iceberg Transforms
+
+Iceberg provides Spark SQL functions that expose internal metadata and 
transformation capabilities useful for working with Iceberg tables.
+
+### `system.iceberg_version()`
+
+Returns the Iceberg library version at runtime.
+
+#### Description
+Returns the current version of Iceberg used in the Spark runtime. This is 
useful for debugging, validation, or auditing purposes where it's necessary to 
track the version of Iceberg being used.
+
+#### Usage
+
+```sql
+SELECT system.iceberg_version();
+```
+
+#### Returns
+A `STRING` value representing the Iceberg version, e.g., `"1.5.0"`.
+
+#### Example
+
+```sql
+SELECT system.iceberg_version();
+-- Result: 1.5.0
+```
+
+---
+
+### `system.bucket(value, num_buckets)`
+
+Returns the bucket number for a given value, using the same hash logic as 
Iceberg's `bucket` partition transform.
+
+---
+
+#### Description
+Computes a deterministic bucket number for the given value, compatible with 
Iceberg table partitioning. This is useful when validating partition logic or 
computing bucket values externally (e.g., in ETL pipelines).
+
+#### Arguments
+
+- `value`: A primitive type (e.g., `INT`, `STRING`, `BIGINT`).
+- `num_buckets`: Positive integer indicating the number of buckets.
+
+#### Usage
+
+```sql
+SELECT system.bucket(column_value, num_buckets);
+```
+
+#### Parameters
+
+* `column_value`: Any supported primitive type (e.g., `INT`, `STRING`) to be 
hashed.
+* `num_buckets`: The total number of buckets (positive integer).
+
+#### Returns
+An `INT` between `0` and `num_buckets - 1`.
+
+#### Example
+
+```sql
+SELECT system.bucket('user_id_123', 8);
+-- Result: 3
+```
+
+### `system.years(value)`
+
+Returns the number of years since the Unix epoch (1970) for a given date or 
timestamp.
+
+#### Description
+
+Calculates `year(value) - 1970`. This is consistent with the Iceberg `years` 
partition transform, and can be used to compute partition keys manually.
+
+#### Arguments
+
+- `value`: A `DATE`, `TIMESTAMP`, or `TIMESTAMP_NTZ`. Other types will result 
in an error.
+
+#### Returns
+
+An `INT` representing the number of years from 1970. If the input is `NULL`, 
the result is also `NULL`.
+
+#### Examples
+
+```sql
+SELECT system.years(DATE '2017-12-01');
+-- Result: 47
+
+SELECT system.years(TIMESTAMP '1969-12-31 23:59:59');

Review Comment:
   maybe also add an example with 1970, similarly to how it's done in 
TestSparkYearsFunction



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Docs: Add docs for Spark SQL Iceberg transform functions (#13156) [iceberg]

Reply via email to