[jira] [Created] (SPARK-32885) Add DataStreamReader.table API

Yuanjian Li (Jira) Mon, 14 Sep 2020 23:22:42 -0700

Yuanjian Li created SPARK-32885:
-----------------------------------

             Summary: Add DataStreamReader.table API
                 Key: SPARK-32885
                 URL: https://issues.apache.org/jira/browse/SPARK-32885
             Project: Spark
          Issue Type: New Feature
          Components: Structured Streaming
    Affects Versions: 3.1.0
            Reporter: Yuanjian Li



This ticket aims to add a new `table` API in DataStreamReader, which is similar 
to the table API in DataFrameReader. Users can directly use this API to get a 
Streaming DataFrame on a table. Below is a simple example:

Application 1 for initializing and starting the streaming job:
{code:java}
val path = "/home/yuanjian.li/runtime/to_be_deleted"
val tblName = "my_table"

// Write some data to `my_table`
spark.range(3).write.format("parquet").option("path", path).saveAsTable(tblName)

// Read the table as a streaming source, write result to destination directory
val table = spark.readStream.table(tblName)
table.writeStream.format("parquet").option("checkpointLocation", 
"/home/yuanjian.li/runtime/to_be_deleted_ck").start("/home/yuanjian.li/runtime/to_be_deleted_2")
{code}
Application 2 for appending new data:
{code:java}
// Append new data into the path
spark.range(5).write.format("parquet").option("path", 
"/home/yuanjian.li/runtime/to_be_deleted").mode("append").save(){code}
Check result:
{code:java}
// The desitination directory should contains all written data
spark.read.parquet("/home/yuanjian.li/runtime/to_be_deleted_2").show()
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-32885) Add DataStreamReader.table API

Reply via email to