rlcyf opened a new issue #2289:
URL: https://github.com/apache/iceberg/issues/2289
spark 3.0.1
iceberg 0.11
```
# push one data to kafka
bin/kafka-console-producer.sh --topic test --bootstrap-server localhost:9092
> {"user_id":1}
```
```
# use structured-streaming consume data and the consumption is successful
val tableIdentifier: String = ...
data.writeStream
.format("iceberg")
.outputMode("append")
.trigger(Trigger.ProcessingTime(1, TimeUnit.MINUTES))
.option("path", tableIdentifier)
.option("checkpointLocation", checkpointPath)
.start()
```
when I execute a query in spark-shell
```
bin/spark-shell --conf
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
--conf spark.sql.catalog.prod=org.apache.iceberg.spark.SparkCatalog --conf
spark.sql.catalog.prod.type=hive --conf
spark.sql.catalog.prod.warehouse=hdfs://localhost:9000/prod --conf
spark.sql.warehouse.dir=hdfs://localhost:9000/prod
spark.sql("select * from prod.db.sample").count
res0: Long = 1
# count on trino
trino:db> select count(1) from prod.db.sample;
1
(1 rows)
```
```
# push one data again
bin/kafka-console-producer.sh --topic test --bootstrap-server localhost:9092
> {"user_id":1}
```
```
spark.sql("select * from prod.db.sample").count
res0: Long = 1
# count on trino
trino:db> select count(1) from prod.db.sample;
2
(1 rows)
```
in trino, the correct results can be queried in real time
when I close spark-shell, restart it
```
spark.sql("select * from prod.db.sample").count
res0: Long = 2
```
the result is correct
there is another situation,after inserting the data, a period of time has
passed (i don't know how long it takes)
query again! the result of the query is correct!
Has a merger compact?
How can I set up to check the correct data in real-time in the spark shell?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]