[
https://issues.apache.org/jira/browse/SPARK-30296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dongjoon Hyun updated SPARK-30296:
----------------------------------
Affects Version/s: (was: 2.4.4)
3.0.0
> Dataset diffing transformation
> ------------------------------
>
> Key: SPARK-30296
> URL: https://issues.apache.org/jira/browse/SPARK-30296
> Project: Spark
> Issue Type: New Feature
> Components: SQL
> Affects Versions: 3.0.0
> Reporter: Enrico Minack
> Priority: Major
>
> Evolving Spark code needs frequent regression testing to prove it still
> produces identical results, or if changes are expected, to investigate those
> changes. Diffing the Datasets of two code paths provides confidence.
> Diffing small schemata is easy, but with wide schema the Spark query becomes
> laborious and error-prone. With a single proven and tested method, diffing
> becomes easier and a more reliable operation. As a Dataset transformation,
> you get this operation first hand with your Dataset API.
> This has proven to be useful for interactive spark as well as deployed
> production code.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]