[ 
https://issues.apache.org/jira/browse/SPARK-57579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jubin Soni updated SPARK-57579:
-------------------------------
    Description: 
*Problem*

The unix_nanos() SQL function and Scala API were added in SPARK-57527, but 
PySpark support was explicitly deferred as a tracked follow-up.

The full family of epoch-unit functions exists in PySpark except for the 
nanosecond member:
{code:java}
  - unix_seconds   -> pyspark.sql.functions.unix_seconds    (present)
  - unix_millis    -> pyspark.sql.functions.unix_millis     (present)
  - unix_micros    -> pyspark.sql.functions.unix_micros     (present)
  - unix_nanos     -> pyspark.sql.functions.unix_nanos      (MISSING)
{code}
The gap is acknowledged in the parity test:
  python/pyspark/sql/tests/test_functions.py, expected_missing_in_py set:
    "unix_nanos",  # SPARK-57527: PySpark support tracked as a follow-up

*How to Reproduce*
{code:java}
from pyspark.sql import functions as sf
  df = spark.sql(
      "SELECT TIMESTAMP_NTZ '2020-01-01 13:24:35.123456789' AS ts"
  )
  df.select(sf.unix_nanos("ts")).show()
  # AttributeError: module 'pyspark.sql.functions' has no attribute 'unix_nanos'
{code}
The SQL path works fine:
{code:java}
  spark.sql("SELECT unix_nanos(TIMESTAMP_NTZ '2020-01-01 13:24:35.123456789')")
  # returns 1577884675123456789 as DECIMAL(21, 0)  -- correct
{code}
*Expected:* sf.unix_nanos(col) is available and returns the same result as the 
SQL unix_nanos() function (DECIMAL(21,0) nanoseconds since epoch).

*Actual:* AttributeError — function is not exposed in the PySpark API.

*Work Needed*

1. python/pyspark/sql/functions/builtin.py
   Add unix_nanos() function after unix_micros (line ~11749), following the 
same pattern as unix_micros:
{code:java}
    @_try_remote_functions
     def unix_nanos(col: "ColumnOrName") -> Column:
         """Returns the number of nanoseconds since 1970-01-01 00:00:00 UTC
         as DECIMAL(21, 0). Only supports TIMESTAMP_LTZ(p) and TIMESTAMP_NTZ(p)
         with precision p in [7, 9].
         ...
         """
         return _invoke_function_over_columns("unix_nanos", col){code}
2. python/pyspark/sql/functions/{_}{{_}}init{{_}}{_}.py
   Export unix_nanos in the {_}{{_}}init{{_}}{_} alongside 
unix_micros/millis/seconds.

3. python/pyspark/sql/connect/functions/builtin.py
   Add the Connect-side wrapper for unix_nanos, following the same structure as 
unix_micros in that file.

4. python/pyspark/sql/tests/test_functions.py
   Remove "unix_nanos" from the expected_missing_in_py set (and its comment).

5. Add a doctest in the unix_nanos docstring covering:
   - A nanosecond-precision TIMESTAMP_NTZ input
   - A NULL input (returns NULL) following the style of unix_micros (lines 
11735-11747).

  was:
*Problem*

The unix_nanos() SQL function and Scala API were added in SPARK-57527, but
PySpark support was explicitly deferred as a tracked follow-up.

The full family of epoch-unit functions exists in PySpark except for the
nanosecond member:

 
{code:java}
  - unix_seconds   -> pyspark.sql.functions.unix_seconds    (present)
  - unix_millis    -> pyspark.sql.functions.unix_millis     (present)
  - unix_micros    -> pyspark.sql.functions.unix_micros     (present)
  - unix_nanos     -> pyspark.sql.functions.unix_nanos      (MISSING)
{code}
The gap is acknowledged in the parity test:
  python/pyspark/sql/tests/test_functions.py, expected_missing_in_py set:
    "unix_nanos",  # SPARK-57527: PySpark support tracked as a follow-up

*How to Reproduce*
{code:java}
from pyspark.sql import functions as sf
  df = spark.sql(
      "SELECT TIMESTAMP_NTZ '2020-01-01 13:24:35.123456789' AS ts"
  )
  df.select(sf.unix_nanos("ts")).show()
  # AttributeError: module 'pyspark.sql.functions' has no attribute 'unix_nanos'
{code}
The SQL path works fine:
{code:java}
  spark.sql("SELECT unix_nanos(TIMESTAMP_NTZ '2020-01-01 13:24:35.123456789')")
  # returns 1577884675123456789 as DECIMAL(21, 0)  -- correct
{code}
*Expected:* sf.unix_nanos(col) is available and returns the same result as
the SQL unix_nanos() function (DECIMAL(21,0) nanoseconds since epoch).

*Actual:* AttributeError — function is not exposed in the PySpark API.

*Work Needed*

1. python/pyspark/sql/functions/builtin.py
   Add unix_nanos() function after unix_micros (line ~11749), following the
   same pattern as unix_micros:
{code:java}
    @_try_remote_functions
     def unix_nanos(col: "ColumnOrName") -> Column:
         """Returns the number of nanoseconds since 1970-01-01 00:00:00 UTC
         as DECIMAL(21, 0). Only supports TIMESTAMP_LTZ(p) and TIMESTAMP_NTZ(p)
         with precision p in [7, 9].
         ...
         """
         return _invoke_function_over_columns("unix_nanos", col){code}
2. python/pyspark/sql/functions/_{_}init{_}_.py
   Export unix_nanos in the _{_}init{_}_ alongside unix_micros/millis/seconds.

3. python/pyspark/sql/connect/functions/builtin.py
   Add the Connect-side wrapper for unix_nanos, following the same structure
   as unix_micros in that file.

4. python/pyspark/sql/tests/test_functions.py
   Remove "unix_nanos" from the expected_missing_in_py set (and its comment).

5. Add a doctest in the unix_nanos docstring covering:
   - A nanosecond-precision TIMESTAMP_NTZ input
   - A NULL input (returns NULL)
   following the style of unix_micros (lines 11735-11747).


> [SQL][PYTHON] Add PySpark support for unix_nanos function
> ---------------------------------------------------------
>
>                 Key: SPARK-57579
>                 URL: https://issues.apache.org/jira/browse/SPARK-57579
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>    Affects Versions: 4.3.0
>            Reporter: Jubin Soni
>            Priority: Major
>
> *Problem*
> The unix_nanos() SQL function and Scala API were added in SPARK-57527, but 
> PySpark support was explicitly deferred as a tracked follow-up.
> The full family of epoch-unit functions exists in PySpark except for the 
> nanosecond member:
> {code:java}
>   - unix_seconds   -> pyspark.sql.functions.unix_seconds    (present)
>   - unix_millis    -> pyspark.sql.functions.unix_millis     (present)
>   - unix_micros    -> pyspark.sql.functions.unix_micros     (present)
>   - unix_nanos     -> pyspark.sql.functions.unix_nanos      (MISSING)
> {code}
> The gap is acknowledged in the parity test:
>   python/pyspark/sql/tests/test_functions.py, expected_missing_in_py set:
>     "unix_nanos",  # SPARK-57527: PySpark support tracked as a follow-up
> *How to Reproduce*
> {code:java}
> from pyspark.sql import functions as sf
>   df = spark.sql(
>       "SELECT TIMESTAMP_NTZ '2020-01-01 13:24:35.123456789' AS ts"
>   )
>   df.select(sf.unix_nanos("ts")).show()
>   # AttributeError: module 'pyspark.sql.functions' has no attribute 
> 'unix_nanos'
> {code}
> The SQL path works fine:
> {code:java}
>   spark.sql("SELECT unix_nanos(TIMESTAMP_NTZ '2020-01-01 
> 13:24:35.123456789')")
>   # returns 1577884675123456789 as DECIMAL(21, 0)  -- correct
> {code}
> *Expected:* sf.unix_nanos(col) is available and returns the same result as 
> the SQL unix_nanos() function (DECIMAL(21,0) nanoseconds since epoch).
> *Actual:* AttributeError — function is not exposed in the PySpark API.
> *Work Needed*
> 1. python/pyspark/sql/functions/builtin.py
>    Add unix_nanos() function after unix_micros (line ~11749), following the 
> same pattern as unix_micros:
> {code:java}
>     @_try_remote_functions
>      def unix_nanos(col: "ColumnOrName") -> Column:
>          """Returns the number of nanoseconds since 1970-01-01 00:00:00 UTC
>          as DECIMAL(21, 0). Only supports TIMESTAMP_LTZ(p) and 
> TIMESTAMP_NTZ(p)
>          with precision p in [7, 9].
>          ...
>          """
>          return _invoke_function_over_columns("unix_nanos", col){code}
> 2. python/pyspark/sql/functions/{_}{{_}}init{{_}}{_}.py
>    Export unix_nanos in the {_}{{_}}init{{_}}{_} alongside 
> unix_micros/millis/seconds.
> 3. python/pyspark/sql/connect/functions/builtin.py
>    Add the Connect-side wrapper for unix_nanos, following the same structure 
> as unix_micros in that file.
> 4. python/pyspark/sql/tests/test_functions.py
>    Remove "unix_nanos" from the expected_missing_in_py set (and its comment).
> 5. Add a doctest in the unix_nanos docstring covering:
>    - A nanosecond-precision TIMESTAMP_NTZ input
>    - A NULL input (returns NULL) following the style of unix_micros (lines 
> 11735-11747).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to