[ https://issues.apache.org/jira/browse/SPARK-25344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16606679#comment-16606679 ]
Hyukjin Kwon commented on SPARK-25344: -------------------------------------- I actually roughly tried this and then quit before since the change it super huge .... one thing we should also consider is to rewrite testing script (considering resource usage since it's ran in multiprocess fashion), and module relations under sparktestsupport IIRC .. at the very least I would get rid of newbie tag .... > Break large tests.py files into smaller files > --------------------------------------------- > > Key: SPARK-25344 > URL: https://issues.apache.org/jira/browse/SPARK-25344 > Project: Spark > Issue Type: Improvement > Components: PySpark > Affects Versions: 2.4.0 > Reporter: Imran Rashid > Priority: Major > Labels: newbie > > We've got a ton of tests in one humongous tests.py file, rather than breaking > it out into smaller files. > Having one huge file doesn't seem great for code organization, and it also > makes the test parallelization in run-tests.py not work as well. On my > laptop, tests.py takes 150s, and the next longest test file takes only 20s. > There are similarly large files in other pyspark modules, eg. sql/tests.py, > ml/tests.py, mllib/tests.py, streaming/tests.py. > It seems that at least for some of these files, its already broken into > independent test classes, so it shouldn't be too hard to just move them into > their own files. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org