[
https://issues.apache.org/jira/browse/SPARK-25377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16608896#comment-16608896
]
Hyukjin Kwon commented on SPARK-25377:
--------------------------------------
Can you post a self-contained reproducer?
> spark sql dataframe cache is invalid
> ------------------------------------
>
> Key: SPARK-25377
> URL: https://issues.apache.org/jira/browse/SPARK-25377
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 2.3.0
> Environment: spark version 2.3.0
> scala version 2.1.8
> Reporter: Iverson Hu
> Priority: Major
>
> When I use SQL dataframe in application, I found that dataframe.cache is
> invalid, the first time to execute Action like count() took me 40 seconds,
> and the seconds time to execute Action also.So I use dataframe.rdd.cache,
> second execution time is less than first execution time. And I think it's SQL
> dataframe's bug.
> This is my codes and console log, and I have cached the datafame of result
> before.
> this is my codes
> logger.info("start to consuming result count")
> logger.info(s"consuming ${result.count} output records")
> //result.show(false)
> logger.info("starting go to MysqlSink")
> logger.info(s"consuming ${result.count} output records")
> logger.info("starting go to MysqlSink")
>
> And console log is below
> 18/09/08 14:15:17 INFO MySQLRiskScenarioRunner: start to consuming result
> count
> 18/09/08 14:15:49 INFO MySQLRiskScenarioRunner: consuming 5 output records
> 18/09/08 14:15:49 INFO MySQLRiskScenarioRunner: starting go to MysqlSink
> 18/09/08 14:16:22 INFO MySQLRiskScenarioRunner: consuming 5 output records
> 18/09/08 14:16:22 INFO MySQLRiskScenarioRunner: starting go to MysqlSink
>
>
>
>
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]