[ 
https://issues.apache.org/jira/browse/SPARK-32046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dustin Smith updated SPARK-32046:
---------------------------------
    Description: 
If I call current_timestamp 3 times while caching the dataframe variable in 
order to freeze that dataframes time, the 3rd dataframe time and beyond (4th, 
5th, ...) will be frozen to the 2nd dataframe's time. The 1st dataframe and the 
2nd will differ in time but will become static on the 3rd usage and beyond.

Additionally, caching only caused 2 dataframes to cache skipping the 3rd. 
However,
{code:java}
val df = Seq(java.time.LocalDateTime.now.toString).toDF("datetime").cache
df.count

// this can be run 3 times no issue.{code}
doesn't have this problem and all 3 dataframes cache with correct times 
displaying.

Running the code in shell and Jupyter or Zeppelin also produces different 
results. In the shell, you only get 1 time all 3 times.

 
{code:java}
val df1 = spark.range(1).select(current_timestamp as "datetime").cache
df1.count

df1.show(false)

Thread.sleep(9500)

val df2 = spark.range(1).select(current_timestamp as "datetime").cache
df2.count 

df2.show(false)

Thread.sleep(9500)

val df3 = spark.range(1).select(current_timestamp as "datetime").cache 
df3.count 

df3.show(false){code}

  was:
If I call current_timestamp 3 times while caching the dataframe variable in 
order to freeze that dataframes time, the 3rd dataframe time and beyond (4th, 
5th, ...) will be frozen to the 2nd dataframe's time. The 1st dataframe and the 
2nd will differ in time but will become static on the 3rd usage and beyond.

Additionally, caching only caused 2 dataframes to cache skipping the 3rd. 
However, `Seq(java.time.LocalDateTime.now.toString).toDF("datetime").cache` 
doesn't have this problem and all 3 dataframes cache with correct times 
displaying.

 
{code:java}
val df1 = spark.range(1).select(current_timestamp as "datetime").cache
df1.count

df1.show(false)

Thread.sleep(9500)

val df2 = spark.range(1).select(current_timestamp as "datetime").cache
df2.count 

df2.show(false)

Thread.sleep(9500)

val df3 = spark.range(1).select(current_timestamp as "datetime").cache 
df3.count 

df3.show(false){code}


> current_timestamp called in a cache dataframe freezes the time for all future 
> calls
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-32046
>                 URL: https://issues.apache.org/jira/browse/SPARK-32046
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.0, 2.4.4
>            Reporter: Dustin Smith
>            Priority: Minor
>
> If I call current_timestamp 3 times while caching the dataframe variable in 
> order to freeze that dataframes time, the 3rd dataframe time and beyond (4th, 
> 5th, ...) will be frozen to the 2nd dataframe's time. The 1st dataframe and 
> the 2nd will differ in time but will become static on the 3rd usage and 
> beyond.
> Additionally, caching only caused 2 dataframes to cache skipping the 3rd. 
> However,
> {code:java}
> val df = Seq(java.time.LocalDateTime.now.toString).toDF("datetime").cache
> df.count
> // this can be run 3 times no issue.{code}
> doesn't have this problem and all 3 dataframes cache with correct times 
> displaying.
> Running the code in shell and Jupyter or Zeppelin also produces different 
> results. In the shell, you only get 1 time all 3 times.
>  
> {code:java}
> val df1 = spark.range(1).select(current_timestamp as "datetime").cache
> df1.count
> df1.show(false)
> Thread.sleep(9500)
> val df2 = spark.range(1).select(current_timestamp as "datetime").cache
> df2.count 
> df2.show(false)
> Thread.sleep(9500)
> val df3 = spark.range(1).select(current_timestamp as "datetime").cache 
> df3.count 
> df3.show(false){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to