Tianhan Hu created SPARK-44919:
----------------------------------
Summary: Avro connector: convert a union of a single primitive
type to a StructType
Key: SPARK-44919
URL: https://issues.apache.org/jira/browse/SPARK-44919
Project: Spark
Issue Type: Improvement
Components: Spark Core
Affects Versions: 3.4.1
Reporter: Tianhan Hu
Spark Avro data source schema converter currently converts union with a single
primitive type to a Spark primitive type instead of a StructType.
While for more complex union types that consists of multiple primitive types,
the schema converter translate them into StructTypes.
For example,
import scala.collection.JavaConverters._
import org.apache.avro._
import org.apache.spark.sql.avro._
// ["string", "null"]
SchemaConverters.toSqlType(
Schema.createUnion(Seq(Schema.create(Schema.Type.STRING),
Schema.create(Schema.Type.NULL)).asJava)
).dataType
// ["string", "int", "null"]
SchemaConverters.toSqlType(
Schema.createUnion(Seq(Schema.create(Schema.Type.STRING),
Schema.create(Schema.Type.INT), Schema.create(Schema.Type.NULL)).asJava)
).dataType
The first one would return StringType, the second would return
StructType(StringType, IntegerType).
We hope to add a new configuration to control the conversion behavior. The
default behavior would still be the same. When the config is altered, a union
with single primitive type would be translated into StructType.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]