[ 
https://issues.apache.org/jira/browse/SPARK-32834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-32834:
---------------------------------
    Description: 
I am trying to read a Kafka topic with Spark readStream but getting problem 
while applying avro schema

 

Code:

{code}
df = spark\
  .readStream\
  .format("kafka")\
  .option("kafka.bootstrap.servers", "host:6667")\
  .option("subscribe", "utopic1")\
  .option("failOnDataLoss", "false")\
  .option("startingOffsets", "earliest")\
  .option("checkpointLocation", "/home/abc/wspace/spark_test/data/")\
  .load()
 
outputDF = df\
        .select(from_avro("value", jsonFormatSchema, 
options={"mode":"FASTFAIL"}).alias("user"))

outputDF.printSchema()

query = outputDF.writeStream.format("console").start()
time.sleep(10)
{code}

Input:

avro schema file: 
[user.avsc|https://github.com/apache/spark/raw/4ad9bfd53b84a6d2497668c73af6899bae14c187/examples/src/main/resources/user.avsc]

Kafka topic: \{'favorite_color': 'Red', 'name': 'Alyssa'}

Expected Output:

It should print values. 

Actual Output:

{code}
+----+
|user|
+----+
|[,]|
+----+
{code}

Additional information:
 - Searched in the internet and found that other peson faced same issue. 
[https://stackoverflow.com/questions/59222774/spark-from-avro-function-returning-null-values]
 - I am able to print values to console if I cast to String using below line 
df.selectExpr("CAST(value AS STRING)")

 

  was:
I am trying to read a Kafka topic with Spark readStream but getting problem 
while applying avro schema

 

Code:

{code}
df = spark\
  .readStream\
  .format("kafka")\
  .option("kafka.bootstrap.servers", "host:6667")\
  .option("subscribe", "utopic1")\
  .option("failOnDataLoss", "false")\
  .option("startingOffsets", "earliest")\
  .option("checkpointLocation", "/home/abc/wspace/spark_test/data/")\
  .load()
 
outputDF = df\
        .select(from_avro("value", jsonFormatSchema, 
options=\{"mode":"FASTFAIL"}).alias("user"))

outputDF.printSchema()

query = outputDF.writeStream.format("console").start()
time.sleep(10)
{code}

Input:

avro schema file: 
[user.avsc|https://github.com/apache/spark/raw/4ad9bfd53b84a6d2497668c73af6899bae14c187/examples/src/main/resources/user.avsc]

Kafka topic: \{'favorite_color': 'Red', 'name': 'Alyssa'}

Expected Output:

It should print values. 

Actual Output:

{code}
+----+
|user|
+----+
|[,]|
+----+
{code}

Additional information:
 - Searched in the internet and found that other peson faced same issue. 
[https://stackoverflow.com/questions/59222774/spark-from-avro-function-returning-null-values]
 - I am able to print values to console if I cast to String using below line 
df.selectExpr("CAST(value AS STRING)")

 


> from_avro is giving empty result
> --------------------------------
>
>                 Key: SPARK-32834
>                 URL: https://issues.apache.org/jira/browse/SPARK-32834
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 3.0.0
>         Environment: Ubuntu 18
> Spark 3.0
> Kafka 2.0.0
>            Reporter: Chaitanya
>            Priority: Major
>
> I am trying to read a Kafka topic with Spark readStream but getting problem 
> while applying avro schema
>  
> Code:
> {code}
> df = spark\
>   .readStream\
>   .format("kafka")\
>   .option("kafka.bootstrap.servers", "host:6667")\
>   .option("subscribe", "utopic1")\
>   .option("failOnDataLoss", "false")\
>   .option("startingOffsets", "earliest")\
>   .option("checkpointLocation", "/home/abc/wspace/spark_test/data/")\
>   .load()
>  
> outputDF = df\
>         .select(from_avro("value", jsonFormatSchema, 
> options={"mode":"FASTFAIL"}).alias("user"))
> outputDF.printSchema()
> query = outputDF.writeStream.format("console").start()
> time.sleep(10)
> {code}
> Input:
> avro schema file: 
> [user.avsc|https://github.com/apache/spark/raw/4ad9bfd53b84a6d2497668c73af6899bae14c187/examples/src/main/resources/user.avsc]
> Kafka topic: \{'favorite_color': 'Red', 'name': 'Alyssa'}
> Expected Output:
> It should print values. 
> Actual Output:
> {code}
> +----+
> |user|
> +----+
> |[,]|
> +----+
> {code}
> Additional information:
>  - Searched in the internet and found that other peson faced same issue. 
> [https://stackoverflow.com/questions/59222774/spark-from-avro-function-returning-null-values]
>  - I am able to print values to console if I cast to String using below line 
> df.selectExpr("CAST(value AS STRING)")
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to