[jira] [Updated] (SPARK-20378) StreamSinkProvider should provide schema in createSink.

Yogesh Mahajan (JIRA) Tue, 18 Apr 2017 14:58:54 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-20378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Yogesh Mahajan updated SPARK-20378:
-----------------------------------
    Description: 
We have our own Sink implementation based on our in memory store and this sink 
is also queryable through SparkSQL with corresponding logical and physical 
plans.  It is very similar to memory Sink provided in structured streaming. 
Custom Sinks are registered through DataSource and it's per query and hence per 
schema. 
StreamingQueryManager can have multiple queries in one sparkSession and their 
schema could be different. 

So with the proposed changes StreamSinkProvider trait will change as follows - 

>From this definition 

trait StreamSinkProvider {
  def createSink(
      sqlContext: SQLContext,
      parameters: Map[String, String],
      partitionColumns: Seq[String],
      outputMode: OutputMode): Sink
}

to this definition - 

trait StreamSinkProvider {
  def createSink(
      schema: StructType,
      sqlContext: SQLContext,
      parameters: Map[String, String],
      partitionColumns: Seq[String],
      outputMode: OutputMode): Sink
}


  was:
We have our own Sink implementation based on our in memory store and this sink 
is also queryable through SparkSQL with corresponding logical and physical 
plans.  It is very similar to memory Sink provided in structured streaming. 
Custome Sinks are registered through DataSource and it's per query and hence 
per schema. 
StreamingQueryManager can have multiple queries in one sparkSession and their 
schema could be different. 

So with the proposed changes StreamSinkProvider trait will change as follows - 

>From this definition 

trait StreamSinkProvider {
  def createSink(
      sqlContext: SQLContext,
      parameters: Map[String, String],
      partitionColumns: Seq[String],
      outputMode: OutputMode): Sink
}

to this definition - 

trait StreamSinkProvider {
  def createSink(
      schema: StructType,
      sqlContext: SQLContext,
      parameters: Map[String, String],
      partitionColumns: Seq[String],
      outputMode: OutputMode): Sink
}



> StreamSinkProvider should provide schema in createSink. 
> --------------------------------------------------------
>
>                 Key: SPARK-20378
>                 URL: https://issues.apache.org/jira/browse/SPARK-20378
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 2.1.0
>            Reporter: Yogesh Mahajan
>
> We have our own Sink implementation based on our in memory store and this 
> sink is also queryable through SparkSQL with corresponding logical and 
> physical plans.  It is very similar to memory Sink provided in structured 
> streaming. Custom Sinks are registered through DataSource and it's per query 
> and hence per schema. 
> StreamingQueryManager can have multiple queries in one sparkSession and their 
> schema could be different. 
> So with the proposed changes StreamSinkProvider trait will change as follows 
> - 
> From this definition 
> trait StreamSinkProvider {
>   def createSink(
>       sqlContext: SQLContext,
>       parameters: Map[String, String],
>       partitionColumns: Seq[String],
>       outputMode: OutputMode): Sink
> }
> to this definition - 
> trait StreamSinkProvider {
>   def createSink(
>       schema: StructType,
>       sqlContext: SQLContext,
>       parameters: Map[String, String],
>       partitionColumns: Seq[String],
>       outputMode: OutputMode): Sink
> }



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-20378) StreamSinkProvider should provide schema in createSink.

Reply via email to