Tianhan Hu created SPARK-44919:
----------------------------------

             Summary: Avro connector: convert a union of a single primitive 
type to a StructType
                 Key: SPARK-44919
                 URL: https://issues.apache.org/jira/browse/SPARK-44919
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 3.4.1
            Reporter: Tianhan Hu


Spark Avro data source schema converter currently converts union with a single 
primitive type to a Spark primitive type instead of a StructType.

While for more complex union types that consists of multiple primitive types, 
the schema converter translate them into StructTypes.

For example, 
import scala.collection.JavaConverters._
import org.apache.avro._
import org.apache.spark.sql.avro._

// ["string", "null"]
SchemaConverters.toSqlType(
  Schema.createUnion(Seq(Schema.create(Schema.Type.STRING), 
Schema.create(Schema.Type.NULL)).asJava)
).dataType

// ["string", "int", "null"]
SchemaConverters.toSqlType(
  Schema.createUnion(Seq(Schema.create(Schema.Type.STRING), 
Schema.create(Schema.Type.INT), Schema.create(Schema.Type.NULL)).asJava)
).dataType
The first one would return StringType, the second would return 
StructType(StringType, IntegerType).
 
We hope to add a new configuration to control the conversion behavior. The 
default behavior would still be the same. When the config is altered, a union 
with single primitive type would be translated into StructType.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to