[ 
https://issues.apache.org/jira/browse/SPARK-57579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-57579:
-----------------------------------
    Labels: pull-request-available  (was: )

> [SQL][PYTHON] Add PySpark support for unix_nanos function
> ---------------------------------------------------------
>
>                 Key: SPARK-57579
>                 URL: https://issues.apache.org/jira/browse/SPARK-57579
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>    Affects Versions: 4.3.0
>            Reporter: Jubin Soni
>            Priority: Major
>              Labels: pull-request-available
>
> *Problem*
> The unix_nanos() SQL function and Scala API were added in SPARK-57527, but 
> PySpark support was explicitly deferred as a tracked follow-up.
> The full family of epoch-unit functions exists in PySpark except for the 
> nanosecond member:
> {code:java}
>   - unix_seconds   -> pyspark.sql.functions.unix_seconds    (present)
>   - unix_millis    -> pyspark.sql.functions.unix_millis     (present)
>   - unix_micros    -> pyspark.sql.functions.unix_micros     (present)
>   - unix_nanos     -> pyspark.sql.functions.unix_nanos      (MISSING)
> {code}
> The gap is acknowledged in the parity test:
>   python/pyspark/sql/tests/test_functions.py, expected_missing_in_py set:
>     "unix_nanos",  # SPARK-57527: PySpark support tracked as a follow-up
> *How to Reproduce*
> {code:java}
> from pyspark.sql import functions as sf
>   df = spark.sql(
>       "SELECT TIMESTAMP_NTZ '2020-01-01 13:24:35.123456789' AS ts"
>   )
>   df.select(sf.unix_nanos("ts")).show()
>   # AttributeError: module 'pyspark.sql.functions' has no attribute 
> 'unix_nanos'
> {code}
> The SQL path works fine:
> {code:java}
>   spark.sql("SELECT unix_nanos(TIMESTAMP_NTZ '2020-01-01 
> 13:24:35.123456789')")
>   # returns 1577884675123456789 as DECIMAL(21, 0)  -- correct
> {code}
> *Expected:* sf.unix_nanos(col) is available and returns the same result as 
> the SQL unix_nanos() function (DECIMAL(21,0) nanoseconds since epoch).
> *Actual:* AttributeError — function is not exposed in the PySpark API.
> *Work Needed*
> 1. python/pyspark/sql/functions/builtin.py
>    Add unix_nanos() function after unix_micros (line ~11749), following the 
> same pattern as unix_micros:
> {code:java}
>     @_try_remote_functions
>      def unix_nanos(col: "ColumnOrName") -> Column:
>          """Returns the number of nanoseconds since 1970-01-01 00:00:00 UTC
>          as DECIMAL(21, 0). Only supports TIMESTAMP_LTZ(p) and 
> TIMESTAMP_NTZ(p)
>          with precision p in [7, 9].
>          ...
>          """
>          return _invoke_function_over_columns("unix_nanos", col){code}
> 2. python/pyspark/sql/functions/{_}{{_}}init{{_}}{_}.py
>    Export unix_nanos in the {_}{{_}}init{{_}}{_} alongside 
> unix_micros/millis/seconds.
> 3. python/pyspark/sql/connect/functions/builtin.py
>    Add the Connect-side wrapper for unix_nanos, following the same structure 
> as unix_micros in that file.
> 4. python/pyspark/sql/tests/test_functions.py
>    Remove "unix_nanos" from the expected_missing_in_py set (and its comment).
> 5. Add a doctest in the unix_nanos docstring covering:
>    - A nanosecond-precision TIMESTAMP_NTZ input
>    - A NULL input (returns NULL) following the style of unix_micros (lines 
> 11735-11747).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to