Yeah, I have been thinking about this too, and Holden did some work here
that this SPIP will reuse. I support this.

On Wed, 14 Jun 2023 at 08:10, Amanda Liu <amanda....@databricks.com.invalid>
wrote:

> Hi all,
>
> I'd like to start a discussion about implementing an official PySpark test
> framework. Currently, there's no official test framework, but only various
> open-source repos and blog posts.
>
> Many of these open-source resources are very popular, which demonstrates
> user-demand for PySpark testing capabilities. spark-testing-base
> <https://github.com/holdenk/spark-testing-base> has 1.4k stars, and chispa
> <https://github.com/MrPowers/chispa> has 532k downloads/month. However,
> it can be confusing for users to piece together disparate resources to
> write their own PySpark tests (see The Elephant in the Room: How to Write
> PySpark Tests
> <https://towardsdatascience.com/the-elephant-in-the-room-how-to-write-pyspark-unit-tests-a5073acabc34>
> ).
>
> We can streamline and simplify the testing process by incorporating test
> features, such as a PySpark Test Base class (which allows tests to share
> Spark sessions) and test util functions (for example, asserting dataframe
> and schema equality).
>
> Please see the SPIP document attached:
> https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07vAnd
> the JIRA ticket: https://issues.apache.org/jira/browse/SPARK-44042
>
> I would appreciate it if you could share your thoughts on this proposal.
>
> Thank you!
> Amanda Liu
>

Reply via email to