[jira] [Comment Edited] (SPARK-26314) support Confluent encoded Avro in Spark Structured Streaming

Jordan Moore (JIRA) Sat, 15 Dec 2018 14:58:49 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-26314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16722314#comment-16722314
 ]


Jordan Moore edited comment on SPARK-26314 at 12/15/18 10:57 PM:
-----------------------------------------------------------------

[~dongjoon], I however feel like AVRO-1124 won't have any traction. Even Avro 
releases themselves have been non-existent over the last few years. My point 
was that AVRO-1124 effectively _became_ the Confluent Schema Registry, with 
Jay, being CEO of Confluent. 

My second point, while I understand Apache projects are maintained by varying 
developers, each with their own directions on the projects, those other 
projects _do support_ it. You're effectively losing a large portion of 
developers by not having it be easily available as a Spark offered library, or 
at the very, very least shown how to integrate it in the Spark documentation. 

Simply adding --packages is not enough, as it still requires a SparkSQL UDF, or 
Encoder/Decoder to wrap Confluent's KafkaAvroSerializer classes, for example 
done here 
https://github.com/xebia-france/spark-structured-streaming-blog/blob/master/src/main/scala/AvroConsumer.scala#L28-L39



was (Author: cricket007):
[~dongjoon], I however feel like AVRO-1124 won't have any traction. Even Avro 
releases themselves have been non-existent over the last few years. My point 
was that AVRO-1124 effectively _became_ the Confluent Schema Registry, with 
Jay, being CEO of Confluent. 

My second point, while I understand Apache projects are maintained by varying 
developers, each with their own directions on the projects, those other 
projects _do support_ it. You're effectively losing a large portion of 
developers by not having it be easily available as a Spark offered library, or 
at the very, very least shown how to integrate it in the Spark documentation. 

Simply adding --packages is not enough, as it still requires a SparkSQL UDF, or 
Encoder/Decoder to wrap Confluent's KafkaAvroSerializer classes. 


> support Confluent encoded Avro in Spark Structured Streaming
> ------------------------------------------------------------
>
>                 Key: SPARK-26314
>                 URL: https://issues.apache.org/jira/browse/SPARK-26314
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 2.4.0
>            Reporter: David Ahern
>            Priority: Major
>
> As Avro has now been added as a first class citizen,
> [https://spark.apache.org/docs/latest/sql-data-sources-avro.html]
> please make Confluent encoded avro work out of the box with Spark Structured 
> Streaming
> as described in this link, Avro messages on Kafka encoded with confluent 
> serializer also need to be decoded with confluent.  It would be great if this 
> worked out of the box
> [https://developer.ibm.com/answers/questions/321440/ibm-iidr-cdc-db2-to-kafka.html?smartspace=blockchain]
> here are details on the Confluent encoding
> [https://www.sderosiaux.com/articles/2017/03/02/serializing-data-efficiently-with-apache-avro-and-dealing-with-a-schema-registry/#encodingdecoding-the-messages-with-the-schema-id]
> It's been a year since i worked on anything to do with Avro and Spark 
> Structured Streaming, but i had to take an approach such as this when getting 
> it to work.  This is what i  used as a reference at that time
> [https://github.com/tubular/confluent-spark-avro]
> Also, here is another link i found that someone has done in the meantime
> [https://github.com/AbsaOSS/ABRiS]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-26314) support Confluent encoded Avro in Spark Structured Streaming

Reply via email to