[ 
https://issues.apache.org/jira/browse/SPARK-34163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17271056#comment-17271056
 ] 

Hyukjin Kwon commented on SPARK-34163:
--------------------------------------

For questions, please use Spark mailing lists before filing it as an issue. 
Please also see https://spark.apache.org/community.html

> Spark Structured Streaming - Kafka avro transformation on optional field 
> Failed
> -------------------------------------------------------------------------------
>
>                 Key: SPARK-34163
>                 URL: https://issues.apache.org/jira/browse/SPARK-34163
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 2.4.7
>            Reporter: Felix Kizhakkel Jose
>            Priority: Major
>
> Hello All,
> I have a spark structured streaming job to inject data from Kafka where 
> message from Kafka is avro type.
> Some of the fields are optional in the data. And I have to perform 
> transformation if those optional fields are present in the data. 
> So I tried to check whether the column exists by :
> {color:#0747A6}def has_column(dataframe, col):
>     """
>     This function checks the existence of a given column in the given 
> DataFrame
>     :param dataframe: the dataframe
>     :type dataframe: DataFrame
>     :param col: the column name
>     :type col: str
>     :return: true if the column exists else false
>     :rtype: boolean
>     """
>     try:
>         dataframe[col]
>         return True
>     except AnalysisException:
>         return False{color}
> But it seems not working when its a streaming dataframe, but when the 
> dataframe is normal dataframe, and when a column is not present the above 
> check returns false, therefore I can ignore the transformation on the missing 
> column. 
> But on Streaming dataframe *has_column* always returns true and therefore the 
> transformation get executed and cause exception. What is the right approach 
> to check existence of column in a streaming dataframe before performing 
> transformation?
> Why streaming dataframe and normal dataframe differ in behavior? How to skip 
> transformation on a column if it doesn't exists?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to