[GitHub] [iceberg] lordk911 opened a new issue #1472: iceberg table write by Spark Structured Streaming can't read new data in one sparksession by sql

GitBox Thu, 17 Sep 2020 00:27:41 -0700


lordk911 opened a new issue #1472:
URL: https://github.com/apache/iceberg/issues/1472



   I'm testing with spark3.0.1 and cdh5.14 ,iceberg0.9.1. and spark-shell
   catalog config is :
   
   spark.sql.catalog.hadoop_prod               
org.apache.iceberg.spark.SparkCatalog
   spark.sql.catalog.hadoop_prod.type          hadoop
   spark.sql.catalog.hadoop_prod.warehouse     
hdfs://hdfsnamespace/user/hive/warehouse
   
   I use Spark Structured Streaming to write kafka data to a iceberg table , 
and I try to count the row number , the result never changed.
   
   ```
   scala> spark.sql("select count(1) from 
hadoop_prod.ice.recmd_feedback_tb").show(false)
   +--------+                                                                   
   
   |count(1)|
   +--------+
   |19573   |
   +--------+
   
   
   scala> spark.sql("select count(1) from 
hadoop_prod.ice.recmd_feedback_tb").show(false)
   +--------+
   |count(1)|
   +--------+
   |19573   |
   +--------+
   
   
   scala> spark.sql("select count(1) from 
hadoop_prod.ice.recmd_feedback_tb").show(false)
   +--------+
   |count(1)|
   +--------+
   |19573   |
   +--------+
   
   
   scala> spark.sql("refresh hadoop_prod.ice.recmd_feedback_tb").show(false)
   ++
   ||
   ++
   ++
   
   
   scala> spark.sql("select count(1) from 
hadoop_prod.ice.recmd_feedback_tb").show(false)
   +--------+
   |count(1)|
   +--------+
   |19573   |
   +--------+
   ```
   
   but use spark.read.load can read the new data 
   
   ```
   scala> val df = 
spark.read.format("iceberg").load("/tmp/warehouse/ice/recmd_feedback_tb")
   df: org.apache.spark.sql.DataFrame = [event_time_string: string, event_id: 
string ... 16 more fields]
   
   scala> df.count
   res5: Long = 19580                                                           
   
   
   scala> val df = 
spark.read.format("iceberg").load("/tmp/warehouse/ice/recmd_feedback_tb")
   df: org.apache.spark.sql.DataFrame = [event_time_string: string, event_id: 
string ... 16 more fields]
   
   scala> df.count
   res6: Long = 19582                             
   ```
   
   how can I refresh the table state and read new data by spark sql in one 
sparksession?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] lordk911 opened a new issue #1472: iceberg table write by Spark Structured Streaming can't read new data in one sparksession by sql

Reply via email to