(Also, maybe there are already good unit tests, and I just don't know where to find them, as Bryan Cutler pointed out for the bit of code I was originally asking about.)
On Wed, Aug 29, 2018 at 3:26 PM Imran Rashid <iras...@cloudera.com> wrote: > Hi Li, > > yes that makes perfect sense. That more-or-less is the same as my view, > though I framed it differently. I guess in that case, I'm really asking: > > Can pyspark changes please be accompanied by more unit tests, and not > assume we're getting coverage from doctests? > > Imran > > On Wed, Aug 29, 2018 at 2:02 PM Li Jin <ice.xell...@gmail.com> wrote: > >> Hi Imran, >> >> My understanding is that doctests and unittests are orthogonal - doctests >> are used to make sure docstring examples are correct and are not meant to >> replace unittests. >> Functionalities are covered by unit tests to ensure correctness and >> doctests are used to test the docstring, not the functionalities itself. >> >> There are issues with doctests, for example, we cannot test arrow related >> functions in doctest because of pyarrow is optional dependency, but I think >> that's a separate issue. >> >> Does this make sense? >> >> Li >> >> On Wed, Aug 29, 2018 at 6:35 PM Imran Rashid <iras...@cloudera.com.invalid> >> wrote: >> >>> Hi, >>> >>> I'd like to propose that we move away from such heavy reliance on >>> doctests in python, and move towards more traditional unit tests. The main >>> reason is that its hard to share test code in doc tests. For example, I >>> was just looking at >>> >>> https://github.com/apache/spark/commit/82c18c240a6913a917df3b55cc5e22649561c4dd >>> and wondering if we had any tests for some of the pyspark changes. >>> SparkSession.createDataFrame has doctests, but those are just run with one >>> standard spark configuration, which does not enable arrow. Its hard to >>> easily reuse that test, just with another spark context with a different >>> conf. Similarly I've wondered about reusing test cases but with >>> local-cluster instead of local mode. I feel like they also discourage >>> writing a test which tries to get more exhaustive coverage on corner cases. >>> >>> I'm not saying we should stop using doctests -- I see why they're nice. >>> I just think they should really only be when you want that code snippet in >>> the doc anyway, so you might as well test it. >>> >>> Admittedly, I'm not really a python-developer, so I could be totally >>> wrong about the right way to author doctests -- pushback welcome! >>> >>> Thoughts? >>> >>> thanks, >>> Imran >>> >>