Jubin Soni created SPARK-57579:
----------------------------------

             Summary: [SQL][PYTHON] Add PySpark support for unix_nanos function
                 Key: SPARK-57579
                 URL: https://issues.apache.org/jira/browse/SPARK-57579
             Project: Spark
          Issue Type: Improvement
          Components: PySpark
    Affects Versions: 4.3.0
            Reporter: Jubin Soni


*Problem*

The unix_nanos() SQL function and Scala API were added in SPARK-57527, but
PySpark support was explicitly deferred as a tracked follow-up.

The full family of epoch-unit functions exists in PySpark except for the
nanosecond member:
  - unix_seconds   -> pyspark.sql.functions.unix_seconds    (present)
  - unix_millis    -> pyspark.sql.functions.unix_millis     (present)
  - unix_micros    -> pyspark.sql.functions.unix_micros     (present)
  - unix_nanos     -> pyspark.sql.functions.unix_nanos      (MISSING)

The gap is acknowledged in the parity test:
  python/pyspark/sql/tests/test_functions.py, expected_missing_in_py set:
    "unix_nanos",  # SPARK-57527: PySpark support tracked as a follow-up


*How to Reproduce*

  from pyspark.sql import functions as sf
  df = spark.sql(
      "SELECT TIMESTAMP_NTZ '2020-01-01 13:24:35.123456789' AS ts"
  )
  df.select(sf.unix_nanos("ts")).show()

  # AttributeError: module 'pyspark.sql.functions' has no attribute 'unix_nanos'

The SQL path works fine:
  spark.sql("SELECT unix_nanos(TIMESTAMP_NTZ '2020-01-01 13:24:35.123456789')")
  # returns 1577884675123456789 as DECIMAL(21, 0)  -- correct

*Expected:* sf.unix_nanos(col) is available and returns the same result as
the SQL unix_nanos() function (DECIMAL(21,0) nanoseconds since epoch).

*Actual:* AttributeError — function is not exposed in the PySpark API.


*Work Needed*

1. python/pyspark/sql/functions/builtin.py
   Add unix_nanos() function after unix_micros (line ~11749), following the
   same pattern as unix_micros:

     @_try_remote_functions
     def unix_nanos(col: "ColumnOrName") -> Column:
         """Returns the number of nanoseconds since 1970-01-01 00:00:00 UTC
         as DECIMAL(21, 0). Only supports TIMESTAMP_LTZ(p) and TIMESTAMP_NTZ(p)
         with precision p in [7, 9].
         ...
         """
         return _invoke_function_over_columns("unix_nanos", col)

2. python/pyspark/sql/functions/__init__.py
   Export unix_nanos in the __init__ alongside unix_micros/millis/seconds.

3. python/pyspark/sql/connect/functions/builtin.py
   Add the Connect-side wrapper for unix_nanos, following the same structure
   as unix_micros in that file.

4. python/pyspark/sql/tests/test_functions.py
   Remove "unix_nanos" from the expected_missing_in_py set (and its comment).

5. Add a doctest in the unix_nanos docstring covering:
   - A nanosecond-precision TIMESTAMP_NTZ input
   - A NULL input (returns NULL)
   following the style of unix_micros (lines 11735-11747).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to