Hi Imran,

On Wed, 29 Aug 2018 at 22:26, Imran Rashid <iras...@cloudera.com.invalid>
wrote:

> Hi Li,
>
> yes that makes perfect sense.  That more-or-less is the same as my view,
> though I framed it differently.  I guess in that case, I'm really asking:
>
> Can pyspark changes please be accompanied by more unit tests, and not
> assume we're getting coverage from doctests?
>

I don't think such assumptions are made, or at least I haven't seen any
evidence of that.

 However,  we often assume that particular components are already tested in
Scala API (SQL, ML), and intentionally don't repeat these tests.


>
> Imran
>
> On Wed, Aug 29, 2018 at 2:02 PM Li Jin <ice.xell...@gmail.com> wrote:
>
>> Hi Imran,
>>
>> My understanding is that doctests and unittests are orthogonal - doctests
>> are used to make sure docstring examples are correct and are not meant to
>> replace unittests.
>> Functionalities are covered by unit tests to ensure correctness and
>> doctests are used to test the docstring, not the functionalities itself.
>>
>> There are issues with doctests, for example, we cannot test arrow related
>> functions in doctest because of pyarrow is optional dependency, but I think
>> that's a separate issue.
>>
>> Does this make sense?
>>
>> Li
>>
>> On Wed, Aug 29, 2018 at 6:35 PM Imran Rashid <iras...@cloudera.com.invalid>
>> wrote:
>>
>>> Hi,
>>>
>>> I'd like to propose that we move away from such heavy reliance on
>>> doctests in python, and move towards more traditional unit tests.  The main
>>> reason is that its hard to share test code in doc tests.  For example, I
>>> was just looking at
>>>
>>> https://github.com/apache/spark/commit/82c18c240a6913a917df3b55cc5e22649561c4dd
>>>  and wondering if we had any tests for some of the pyspark changes.
>>> SparkSession.createDataFrame has doctests, but those are just run with one
>>> standard spark configuration, which does not enable arrow.  Its hard to
>>> easily reuse that test, just with another spark context with a different
>>> conf.  Similarly I've wondered about reusing test cases but with
>>> local-cluster instead of local mode.  I feel like they also discourage
>>> writing a test which tries to get more exhaustive coverage on corner cases.
>>>
>>> I'm not saying we should stop using doctests -- I see why they're nice.
>>> I just think they should really only be when you want that code snippet in
>>> the doc anyway, so you might as well test it.
>>>
>>> Admittedly, I'm not really a python-developer, so I could be totally
>>> wrong about the right way to author doctests -- pushback welcome!
>>>
>>> Thoughts?
>>>
>>> thanks,
>>> Imran
>>>
>>

Reply via email to