[ https://issues.apache.org/jira/browse/SPARK-25344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641294#comment-16641294 ]
Hyukjin Kwon commented on SPARK-25344: -------------------------------------- [~irashid], would you mind if I try to take a look for this again? > Break large tests.py files into smaller files > --------------------------------------------- > > Key: SPARK-25344 > URL: https://issues.apache.org/jira/browse/SPARK-25344 > Project: Spark > Issue Type: Improvement > Components: PySpark > Affects Versions: 2.4.0 > Reporter: Imran Rashid > Priority: Major > > We've got a ton of tests in one humongous tests.py file, rather than breaking > it out into smaller files. > Having one huge file doesn't seem great for code organization, and it also > makes the test parallelization in run-tests.py not work as well. On my > laptop, tests.py takes 150s, and the next longest test file takes only 20s. > There are similarly large files in other pyspark modules, eg. sql/tests.py, > ml/tests.py, mllib/tests.py, streaming/tests.py. > It seems that at least for some of these files, its already broken into > independent test classes, so it shouldn't be too hard to just move them into > their own files. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org