Jackey Lee created SPARK-25937:
----------------------------------

             Summary: Support user-defined schema in Kafka Source & Sink
                 Key: SPARK-25937
                 URL: https://issues.apache.org/jira/browse/SPARK-25937
             Project: Spark
          Issue Type: Improvement
          Components: Structured Streaming
    Affects Versions: 2.4.0
            Reporter: Jackey Lee


    Kafka Source & Sink is widely used in Spark and  has the highest frequency 
in streaming production environment. But at present, both Kafka Source and Link 
use the fixed schema, which force user to do data conversion when reading and 
writing Kafka. So why not we use fileformat to do this just like hive?

    Flink has implemented Kafka's Json/Csv/Avro extended Source & Sink, we can 
also support it in Spark.

*Main Goals:*

1. Provide a Source and Sink that support user defined Schema. Users can read 
and write Kafka directly in the program without additional data conversion.

2. Provides read-write mechanism based on FileFormat. User's data conversion is 
similar to FileFormat's read and write process, we can provide a mechanism 
similar to FileFormat, which provide common read-write format conversion. It 
also allow users to customize format conversion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to