[
https://issues.apache.org/jira/browse/SPARK-57741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Max Gekk reassigned SPARK-57741:
--------------------------------
Assignee: Jubin Soni
> Add timestamp_nanos to PySpark public API
> -----------------------------------------
>
> Key: SPARK-57741
> URL: https://issues.apache.org/jira/browse/SPARK-57741
> Project: Spark
> Issue Type: Improvement
> Components: PySpark
> Affects Versions: 4.3.0, 5.0.0
> Reporter: Jubin Soni
> Assignee: Jubin Soni
> Priority: Minor
> Labels: pull-request-available
>
> *What is the issue?*
> {{timestamp_nanos(e)}} converts a nanoseconds-since-epoch integer to a
> {{TIMESTAMP_LTZ(9)}} value. It is the inverse of {{unix_nanos}} and was added
> to the Scala API in SPARK-57526. PySpark support was explicitly deferred and
> tracked as a follow-up — the function is listed in {{expected_missing_in_py}}
> in {{{}python/pyspark/sql/tests/test_functions.py{}}}:
> {{expected_missing_in_py = \{
> "timestamp_nanos"
> } # SPARK-57526: PySpark support tracked as a follow-up}}
> Without this function, PySpark users can convert a nanosecond-precision
> timestamp to epoch nanoseconds via {{{}unix_nanos{}}}, but cannot convert
> back, leaving the round-trip incomplete in Python.
> ----
> *How to reproduce*
> {{from pyspark.sql import functions as sf
> # unix_nanos exists:
> sf.unix_nanos("ts") # works
> # timestamp_nanos does not:
> sf.timestamp_nanos(lit(1577885075123456789)) # AttributeError}}
> ----
> *Actual behavior*
> {{AttributeError: module 'pyspark.sql.functions' has no attribute
> 'timestamp_nanos'}}
> ----
> *Expected behavior*
> {{from pyspark.sql import functions as sf
> df = spark.sql("SELECT BIGINT('1577885075123456789') AS ns")
> df.select(sf.timestamp_nanos("ns")).show(truncate=False)
> # +-----------------------------+
> # |timestamp_nanos(ns) |
> # +-----------------------------+
> # |2020-01-01 13:24:35.123456789|
> # +-----------------------------+}}
> ----
> *Proposed fix*
> Follow the pattern of {{{}timestamp_micros{}}}:
> * Add {{timestamp_nanos(col)}} to {{python/pyspark/sql/functions/builtin.py}}
> * Add Connect-side wrapper in
> {{python/pyspark/sql/connect/functions/builtin.py}}
> * Export from {{python/pyspark/sql/functions/__init__.py}}
> * Add entry to {{python/docs/source/reference/pyspark.sql/functions.rst}}
> * Remove {{"timestamp_nanos"}} from {{expected_missing_in_py}} in
> {{python/pyspark/sql/tests/test_functions.py}}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]