[ https://issues.apache.org/jira/browse/SPARK-26314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16722314#comment-16722314 ]
Jordan Moore edited comment on SPARK-26314 at 12/15/18 10:57 PM: ----------------------------------------------------------------- [~dongjoon], I however feel like AVRO-1124 won't have any traction. Even Avro releases themselves have been non-existent over the last few years. My point was that AVRO-1124 effectively _became_ the Confluent Schema Registry, with Jay, being CEO of Confluent. My second point, while I understand Apache projects are maintained by varying developers, each with their own directions on the projects, those other projects _do support_ it. You're effectively losing a large portion of developers by not having it be easily available as a Spark offered library, or at the very, very least shown how to integrate it in the Spark documentation. Simply adding --packages is not enough, as it still requires a SparkSQL UDF, or Encoder/Decoder to wrap Confluent's KafkaAvroSerializer classes, for example done here https://github.com/xebia-france/spark-structured-streaming-blog/blob/master/src/main/scala/AvroConsumer.scala#L28-L39 was (Author: cricket007): [~dongjoon], I however feel like AVRO-1124 won't have any traction. Even Avro releases themselves have been non-existent over the last few years. My point was that AVRO-1124 effectively _became_ the Confluent Schema Registry, with Jay, being CEO of Confluent. My second point, while I understand Apache projects are maintained by varying developers, each with their own directions on the projects, those other projects _do support_ it. You're effectively losing a large portion of developers by not having it be easily available as a Spark offered library, or at the very, very least shown how to integrate it in the Spark documentation. Simply adding --packages is not enough, as it still requires a SparkSQL UDF, or Encoder/Decoder to wrap Confluent's KafkaAvroSerializer classes. > support Confluent encoded Avro in Spark Structured Streaming > ------------------------------------------------------------ > > Key: SPARK-26314 > URL: https://issues.apache.org/jira/browse/SPARK-26314 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming > Affects Versions: 2.4.0 > Reporter: David Ahern > Priority: Major > > As Avro has now been added as a first class citizen, > [https://spark.apache.org/docs/latest/sql-data-sources-avro.html] > please make Confluent encoded avro work out of the box with Spark Structured > Streaming > as described in this link, Avro messages on Kafka encoded with confluent > serializer also need to be decoded with confluent. It would be great if this > worked out of the box > [https://developer.ibm.com/answers/questions/321440/ibm-iidr-cdc-db2-to-kafka.html?smartspace=blockchain] > here are details on the Confluent encoding > [https://www.sderosiaux.com/articles/2017/03/02/serializing-data-efficiently-with-apache-avro-and-dealing-with-a-schema-registry/#encodingdecoding-the-messages-with-the-schema-id] > It's been a year since i worked on anything to do with Avro and Spark > Structured Streaming, but i had to take an approach such as this when getting > it to work. This is what i used as a reference at that time > [https://github.com/tubular/confluent-spark-avro] > Also, here is another link i found that someone has done in the meantime > [https://github.com/AbsaOSS/ABRiS] > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org