+1 on what Li said.

And +1 on getting more coverage in unit tests - however often times we omit 
python unit tests deliberately if the python “wrapper” is trivial. This is what 
I’ve learned over the years from the previous pyspark maintainers. Admittedly 
gaps are there.


________________________________
From: Imran Rashid <iras...@cloudera.com.invalid>
Sent: Wednesday, August 29, 2018 1:42 PM
To: ice.xell...@gmail.com
Cc: dev
Subject: Re: [DISCUSS] move away from python doctests

(Also, maybe there are already good unit tests, and I just don't know where to 
find them, as Bryan Cutler pointed out for the bit of code I was originally 
asking about.)

On Wed, Aug 29, 2018 at 3:26 PM Imran Rashid 
<iras...@cloudera.com<mailto:iras...@cloudera.com>> wrote:
Hi Li,

yes that makes perfect sense.  That more-or-less is the same as my view, though 
I framed it differently.  I guess in that case, I'm really asking:

Can pyspark changes please be accompanied by more unit tests, and not assume 
we're getting coverage from doctests?

Imran

On Wed, Aug 29, 2018 at 2:02 PM Li Jin 
<ice.xell...@gmail.com<mailto:ice.xell...@gmail.com>> wrote:
Hi Imran,

My understanding is that doctests and unittests are orthogonal - doctests are 
used to make sure docstring examples are correct and are not meant to replace 
unittests.
Functionalities are covered by unit tests to ensure correctness and doctests 
are used to test the docstring, not the functionalities itself.

There are issues with doctests, for example, we cannot test arrow related 
functions in doctest because of pyarrow is optional dependency, but I think 
that's a separate issue.

Does this make sense?

Li

On Wed, Aug 29, 2018 at 6:35 PM Imran Rashid <iras...@cloudera.com.invalid> 
wrote:
Hi,

I'd like to propose that we move away from such heavy reliance on doctests in 
python, and move towards more traditional unit tests.  The main reason is that 
its hard to share test code in doc tests.  For example, I was just looking at
https://github.com/apache/spark/commit/82c18c240a6913a917df3b55cc5e22649561c4dd
 and wondering if we had any tests for some of the pyspark changes.  
SparkSession.createDataFrame has doctests, but those are just run with one 
standard spark configuration, which does not enable arrow.  Its hard to easily 
reuse that test, just with another spark context with a different conf.  
Similarly I've wondered about reusing test cases but with local-cluster instead 
of local mode.  I feel like they also discourage writing a test which tries to 
get more exhaustive coverage on corner cases.

I'm not saying we should stop using doctests -- I see why they're nice.  I just 
think they should really only be when you want that code snippet in the doc 
anyway, so you might as well test it.

Admittedly, I'm not really a python-developer, so I could be totally wrong 
about the right way to author doctests -- pushback welcome!

Thoughts?

thanks,
Imran

Reply via email to