[
https://issues.apache.org/jira/browse/SPARK-24855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16553040#comment-16553040
]
Apache Spark commented on SPARK-24855:
--------------------------------------
User 'lindblombr' has created a pull request for this issue:
https://github.com/apache/spark/pull/21847
> Built-in AVRO support should support specified schema on write
> --------------------------------------------------------------
>
> Key: SPARK-24855
> URL: https://issues.apache.org/jira/browse/SPARK-24855
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 2.4.0
> Reporter: Brian Lindblom
> Assignee: Brian Lindblom
> Priority: Minor
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> spark-avro appears to have been brought in from an upstream project,
> [https://github.com/databricks/spark-avro.] I opened a PR a while ago to
> enable support for 'forceSchema', which allows us to specify an AVRO schema
> with which to write our records to handle some use cases we have. I didn't
> get this code merged but would like to add this feature to the AVRO
> reader/writer code that was brought in. The PR is here and I will follow up
> with a more formal PR/Patch rebased on spark master branch:
> https://github.com/databricks/spark-avro/pull/222
>
> This change allows us to specify a schema, which should be compatible with
> the schema generated by spark-avro from the dataset definition. This allows
> a user to do things like specify default values, change union ordering, or...
> in the case where you're reading in an AVRO data set, doing some sort of
> in-line field cleansing, then writing out with the original schema, preserve
> that original schema in the output container files. I've had several use
> cases where this behavior was desired and there were several other asks for
> this in the spark-avro project.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]