[
https://issues.apache.org/jira/browse/SPARK-31074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17057505#comment-17057505
]
Kyrill Alyoshin edited comment on SPARK-31074 at 3/12/20, 1:01 AM:
-------------------------------------------------------------------
Here you go.
Avro schema:
{code:java}
{
"type": "record",
"namespace": "com.domain.em",
"name": "PracticeDiff",
"fields": [
{
"name": "practiceId",
"type": "string"
},
{
"name": "value",
"type": "string"
},
{
"name": "checkedValue",
"type": "string"
}
]
} {code}
Java code:
{code:java}
package com.domain.em;
public final class PracticeDiff {
private String practiceId;
private String value;
private String checkedValue;
public String getPracticeId() {
return practiceId;
}
public String getValue() {
return value;
}
public String getCheckedValue() {
return checkedValue;
}
} {code}
Thank you!
was (Author: kyrill007):
Here you go.
Avro schema:
{code:java}
{
"type": "record",
"namespace": "com.domain.em",
"name": "PracticeDiff",
"fields": [
{
"name": "practiceId",
"type": "string"
},
{
"name": "value",
"type": "string"
},
{
"name": "checkedValue",
"type": "string"
}
]
} {code}
Java code:
{code:java}
package com.domain.em;
public final class PracticeDiff {
private String practiceId;
private String value;
private String checkedValue;
public String getPracticeId() {
return practiceId;
}
public String getValue() {
return value;
}
public String getCheckedValue() {
return checkedValue;
}
} {code}
Thank you!
> Avro serializer should not fail when a nullable Spark field is written to a
> non-null Avro column
> ------------------------------------------------------------------------------------------------
>
> Key: SPARK-31074
> URL: https://issues.apache.org/jira/browse/SPARK-31074
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Affects Versions: 2.4.4
> Reporter: Kyrill Alyoshin
> Priority: Major
>
> Spark StructType schema are strongly biased towards having _nullable_ fields.
> In fact, this is what _Encoders.bean()_ does - any non-primitive field is
> automatically _nullable_. When we attempt to serialize dataframes into
> *user-supplied* Avro schemas where such corresponding fields are marked as
> _non-null_ (i.e., they are not of _union_ type) any such attempt will fail
> with the following exception
>
> {code:java}
> Caused by: org.apache.avro.AvroRuntimeException: Not a union: "string"
> at org.apache.avro.Schema.getTypes(Schema.java:299)
> at
> org.apache.spark.sql.avro.AvroSerializer.org$apache$spark$sql$avro$AvroSerializer$$resolveNullableType(AvroSerializer.scala:229)
> at
> org.apache.spark.sql.avro.AvroSerializer$$anonfun$3.apply(AvroSerializer.scala:209)
> {code}
> This seems as rather draconian. We certainly should be able to write a field
> of the same type and with the same name if it is not a null into a
> non-nullable Avro column. In fact, the problem is so *severe* that it is not
> clear what should be done in such situations when Avro schema is given to you
> as part of API communication contract (i.e., it is non-changeable).
> This is an important issue.
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]