Jubin Soni created SPARK-57579:
----------------------------------
Summary: [SQL][PYTHON] Add PySpark support for unix_nanos function
Key: SPARK-57579
URL: https://issues.apache.org/jira/browse/SPARK-57579
Project: Spark
Issue Type: Improvement
Components: PySpark
Affects Versions: 4.3.0
Reporter: Jubin Soni
*Problem*
The unix_nanos() SQL function and Scala API were added in SPARK-57527, but
PySpark support was explicitly deferred as a tracked follow-up.
The full family of epoch-unit functions exists in PySpark except for the
nanosecond member:
- unix_seconds -> pyspark.sql.functions.unix_seconds (present)
- unix_millis -> pyspark.sql.functions.unix_millis (present)
- unix_micros -> pyspark.sql.functions.unix_micros (present)
- unix_nanos -> pyspark.sql.functions.unix_nanos (MISSING)
The gap is acknowledged in the parity test:
python/pyspark/sql/tests/test_functions.py, expected_missing_in_py set:
"unix_nanos", # SPARK-57527: PySpark support tracked as a follow-up
*How to Reproduce*
from pyspark.sql import functions as sf
df = spark.sql(
"SELECT TIMESTAMP_NTZ '2020-01-01 13:24:35.123456789' AS ts"
)
df.select(sf.unix_nanos("ts")).show()
# AttributeError: module 'pyspark.sql.functions' has no attribute 'unix_nanos'
The SQL path works fine:
spark.sql("SELECT unix_nanos(TIMESTAMP_NTZ '2020-01-01 13:24:35.123456789')")
# returns 1577884675123456789 as DECIMAL(21, 0) -- correct
*Expected:* sf.unix_nanos(col) is available and returns the same result as
the SQL unix_nanos() function (DECIMAL(21,0) nanoseconds since epoch).
*Actual:* AttributeError — function is not exposed in the PySpark API.
*Work Needed*
1. python/pyspark/sql/functions/builtin.py
Add unix_nanos() function after unix_micros (line ~11749), following the
same pattern as unix_micros:
@_try_remote_functions
def unix_nanos(col: "ColumnOrName") -> Column:
"""Returns the number of nanoseconds since 1970-01-01 00:00:00 UTC
as DECIMAL(21, 0). Only supports TIMESTAMP_LTZ(p) and TIMESTAMP_NTZ(p)
with precision p in [7, 9].
...
"""
return _invoke_function_over_columns("unix_nanos", col)
2. python/pyspark/sql/functions/__init__.py
Export unix_nanos in the __init__ alongside unix_micros/millis/seconds.
3. python/pyspark/sql/connect/functions/builtin.py
Add the Connect-side wrapper for unix_nanos, following the same structure
as unix_micros in that file.
4. python/pyspark/sql/tests/test_functions.py
Remove "unix_nanos" from the expected_missing_in_py set (and its comment).
5. Add a doctest in the unix_nanos docstring covering:
- A nanosecond-precision TIMESTAMP_NTZ input
- A NULL input (returns NULL)
following the style of unix_micros (lines 11735-11747).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]