Re: testing frameworks

2019-02-04 Thread Marco Mistroni
Thanks Hichame will follow up on that

Anyonen on this list using python version of spark-testing-base? seems
theres support for DataFrame

thanks in advance and regards
 Marco

On Sun, Feb 3, 2019 at 9:58 PM Hichame El Khalfi 
wrote:

> Hi,
> You can use pysparkling => https://github.com/svenkreiss/pysparkling
> This lib is useful in case you have RDD.
>
> Hope this helps,
>
> Hichame
>
> *From:* mmistr...@gmail.com
> *Sent:* February 3, 2019 4:42 PM
> *To:* radams...@gmail.com
> *Cc:* la...@mapflat.com; bpru...@opentext.com; user@spark.apache.org
> *Subject:* Re: testing frameworks
>
> Hi
>  sorry to resurrect this thread
> Any spark libraries for testing code in pyspark?  the github code above
> seems related to Scala
> following links in the original threads (and also LMGFY) i found out
> pytest-spark · PyPI <https://pypi.org/project/pytest-spark/>
>
> w/kindest regards
>  Marco
>
>
>
>
> On Tue, Jun 12, 2018 at 6:44 PM Ryan Adams  wrote:
>
>> We use spark testing base for unit testing.  These tests execute on a
>> very small amount of data that covers all paths the code can take (or most
>> paths anyway).
>>
>> https://github.com/holdenk/spark-testing-base
>>
>> For integration testing we use automated routines to ensure that
>> aggregate values match an aggregate baseline.
>>
>> Ryan
>>
>> Ryan Adams
>> radams...@gmail.com
>>
>> On Tue, Jun 12, 2018 at 11:51 AM, Lars Albertsson 
>> wrote:
>>
>>> Hi,
>>>
>>> I wrote this answer to the same question a couple of years ago:
>>> https://www.mail-archive.com/user%40spark.apache.org/msg48032.html
>>>
>>> I have made a couple of presentations on the subject. Slides and video
>>> are linked on this page: http://www.mapflat.com/presentations/
>>>
>>> You can find more material in this list of resources:
>>> http://www.mapflat.com/lands/resources/reading-list
>>>
>>> Happy testing!
>>>
>>> Regards,
>>>
>>>
>>>
>>> Lars Albertsson
>>> Data engineering consultant
>>> www.mapflat.com
>>> https://twitter.com/lalleal
>>> +46 70 7687109
>>> Calendar: http://www.mapflat.com/calendar
>>>
>>>
>>> On Mon, May 21, 2018 at 2:24 PM, Steve Pruitt 
>>> wrote:
>>> > Hi,
>>> >
>>> >
>>> >
>>> > Can anyone recommend testing frameworks suitable for Spark jobs.
>>> Something
>>> > that can be integrated into a CI tool would be great.
>>> >
>>> >
>>> >
>>> > Thanks.
>>> >
>>> >
>>>
>>> -
>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>
>>>
>>


Re: testing frameworks

2019-02-03 Thread Hichame El Khalfi
Hi,
You can use pysparkling => https://github.com/svenkreiss/pysparkling
This lib is useful in case you have RDD.

Hope this helps,

Hichame

From: mmistr...@gmail.com
Sent: February 3, 2019 4:42 PM
To: radams...@gmail.com
Cc: la...@mapflat.com; bpru...@opentext.com; user@spark.apache.org
Subject: Re: testing frameworks


Hi
 sorry to resurrect this thread
Any spark libraries for testing code in pyspark?  the github code above seems 
related to Scala
following links in the original threads (and also LMGFY) i found out
<https://pypi.org/project/pytest-spark/>
pytest-spark · PyPI


w/kindest regards
 Marco




On Tue, Jun 12, 2018 at 6:44 PM Ryan Adams 
mailto:radams...@gmail.com>> wrote:
We use spark testing base for unit testing.  These tests execute on a very 
small amount of data that covers all paths the code can take (or most paths 
anyway).

https://github.com/holdenk/spark-testing-base

For integration testing we use automated routines to ensure that aggregate 
values match an aggregate baseline.

Ryan

Ryan Adams
radams...@gmail.com<mailto:radams...@gmail.com>

On Tue, Jun 12, 2018 at 11:51 AM, Lars Albertsson 
mailto:la...@mapflat.com>> wrote:
Hi,

I wrote this answer to the same question a couple of years ago:
https://www.mail-archive.com/user%40spark.apache.org/msg48032.html

I have made a couple of presentations on the subject. Slides and video
are linked on this page: http://www.mapflat.com/presentations/

You can find more material in this list of resources:
http://www.mapflat.com/lands/resources/reading-list

Happy testing!

Regards,



Lars Albertsson
Data engineering consultant
www.mapflat.com<http://www.mapflat.com>
https://twitter.com/lalleal
+46 70 7687109
Calendar: http://www.mapflat.com/calendar


On Mon, May 21, 2018 at 2:24 PM, Steve Pruitt 
mailto:bpru...@opentext.com>> wrote:
> Hi,
>
>
>
> Can anyone recommend testing frameworks suitable for Spark jobs.  Something
> that can be integrated into a CI tool would be great.
>
>
>
> Thanks.
>
>

-
To unsubscribe e-mail: 
user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>




Re: testing frameworks

2019-02-03 Thread Marco Mistroni
Hi
 sorry to resurrect this thread
Any spark libraries for testing code in pyspark?  the github code above
seems related to Scala
following links in the original threads (and also LMGFY) i found out
pytest-spark · PyPI 

w/kindest regards
 Marco




On Tue, Jun 12, 2018 at 6:44 PM Ryan Adams  wrote:

> We use spark testing base for unit testing.  These tests execute on a very
> small amount of data that covers all paths the code can take (or most paths
> anyway).
>
> https://github.com/holdenk/spark-testing-base
>
> For integration testing we use automated routines to ensure that aggregate
> values match an aggregate baseline.
>
> Ryan
>
> Ryan Adams
> radams...@gmail.com
>
> On Tue, Jun 12, 2018 at 11:51 AM, Lars Albertsson 
> wrote:
>
>> Hi,
>>
>> I wrote this answer to the same question a couple of years ago:
>> https://www.mail-archive.com/user%40spark.apache.org/msg48032.html
>>
>> I have made a couple of presentations on the subject. Slides and video
>> are linked on this page: http://www.mapflat.com/presentations/
>>
>> You can find more material in this list of resources:
>> http://www.mapflat.com/lands/resources/reading-list
>>
>> Happy testing!
>>
>> Regards,
>>
>>
>>
>> Lars Albertsson
>> Data engineering consultant
>> www.mapflat.com
>> https://twitter.com/lalleal
>> +46 70 7687109
>> Calendar: http://www.mapflat.com/calendar
>>
>>
>> On Mon, May 21, 2018 at 2:24 PM, Steve Pruitt 
>> wrote:
>> > Hi,
>> >
>> >
>> >
>> > Can anyone recommend testing frameworks suitable for Spark jobs.
>> Something
>> > that can be integrated into a CI tool would be great.
>> >
>> >
>> >
>> > Thanks.
>> >
>> >
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
>


Re: testing frameworks

2018-06-12 Thread Ryan Adams
We use spark testing base for unit testing.  These tests execute on a very
small amount of data that covers all paths the code can take (or most paths
anyway).

https://github.com/holdenk/spark-testing-base

For integration testing we use automated routines to ensure that aggregate
values match an aggregate baseline.

Ryan

Ryan Adams
radams...@gmail.com

On Tue, Jun 12, 2018 at 11:51 AM, Lars Albertsson  wrote:

> Hi,
>
> I wrote this answer to the same question a couple of years ago:
> https://www.mail-archive.com/user%40spark.apache.org/msg48032.html
>
> I have made a couple of presentations on the subject. Slides and video
> are linked on this page: http://www.mapflat.com/presentations/
>
> You can find more material in this list of resources:
> http://www.mapflat.com/lands/resources/reading-list
>
> Happy testing!
>
> Regards,
>
>
>
> Lars Albertsson
> Data engineering consultant
> www.mapflat.com
> https://twitter.com/lalleal
> +46 70 7687109
> Calendar: http://www.mapflat.com/calendar
>
>
> On Mon, May 21, 2018 at 2:24 PM, Steve Pruitt 
> wrote:
> > Hi,
> >
> >
> >
> > Can anyone recommend testing frameworks suitable for Spark jobs.
> Something
> > that can be integrated into a CI tool would be great.
> >
> >
> >
> > Thanks.
> >
> >
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: testing frameworks

2018-06-12 Thread Lars Albertsson
Hi,

I wrote this answer to the same question a couple of years ago:
https://www.mail-archive.com/user%40spark.apache.org/msg48032.html

I have made a couple of presentations on the subject. Slides and video
are linked on this page: http://www.mapflat.com/presentations/

You can find more material in this list of resources:
http://www.mapflat.com/lands/resources/reading-list

Happy testing!

Regards,



Lars Albertsson
Data engineering consultant
www.mapflat.com
https://twitter.com/lalleal
+46 70 7687109
Calendar: http://www.mapflat.com/calendar


On Mon, May 21, 2018 at 2:24 PM, Steve Pruitt  wrote:
> Hi,
>
>
>
> Can anyone recommend testing frameworks suitable for Spark jobs.  Something
> that can be integrated into a CI tool would be great.
>
>
>
> Thanks.
>
>

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: testing frameworks

2018-06-04 Thread Spico Florin
Hello!
  Thank you very much for your helpful answer and for the very good job
performed in spark-testing-base . I managed to perform unit testing with
spark-testing-base library as the provided article and also get inspired
from

https://github.com/holdenk/spark-testing-base/blob/master/src/test/1.3/java/com/holdenkarau/spark/testing/SampleJavaRDDTest.java
.


I had some concerns regarding on how to deal with compairing the RDDs that
come from Dataframe and the one that come from jsc().parallelize method.

My workflow tests is as follow:
1. Get the data from a parquet file as dataframe
2. Convert dataframe  to toJavaRDD()
3. perform some mapping on the JavaRdd
4. Check whether the resulted mapped rdd  is equal with the expected one
(retrieved from a text file)

I performed the above test with following code snippet

 JavaRDD expected = jsc().parallelize(input_from_text_file);
SparkSession spark = SparkSession.builder().getOrCreate();

JavaRDD input =

spark.read().parquet("src/test/resources/test_data.parquet").toJavaRDD();

JavaRDD result = MyDriver.convertToMyCustomerData(input);
 JavaRDDComparisons.assertRDDEquals(expected, result);

The above tests failed failed, even through the data is the same. By
debugging the code, I observed that the data from that came from the
DataFrame didn't have the same order as the one that came from
jsc().parallelize(text_file).

So, I suppose that the issue came from the fact that the SparkSession and
jsc() don't share the same SparkContext (there is a warning about this when
running the program).

Therefore I came to the solution, to use the same jsc for both of the
expected and the result. With this solution the assertion succeeded as
expected.

  List df
=spark.read().parquet("src/test/resources/test_data.parquet").toJavaRDD().collect();
JavaRDD input = jsc().parallelize(df);

JavaRDD result = MyDriver.convertToMyCustomerData(input);
 JavaRDDComparisons.assertRDDEquals(expected, result);


My questions are:
1. what is the best solution to deal with RDDs comparison  when the RDDs
are built from Dataframes and when they are tested with RDDs obtained via
jsc().parallelize()?
2. Is the above solution a suitable one?

I look forward for your answers.

Regards,
  Florin







On Wed, May 30, 2018 at 3:11 PM, Holden Karau  wrote:

> So Jessie has an excellent blog post on how to use it with Java
> applications -
> http://www.jesse-anderson.com/2016/04/unit-testing-spark-with-java/
>
> On Wed, May 30, 2018 at 4:14 AM Spico Florin 
> wrote:
>
>> Hello!
>>   I'm also looking for unit testing spark Java application. I've seen the
>> great work done in  spark-testing-base but it seemed to me that I could
>> not use for Spark Java applications.
>> Only spark scala applications are supported?
>> Thanks.
>> Regards,
>>  Florin
>>
>> On Wed, May 23, 2018 at 8:07 AM, umargeek 
>> wrote:
>>
>>> Hi Steve,
>>>
>>> you can try out pytest-spark plugin if your writing programs using
>>> pyspark
>>> ,please find below link for reference.
>>>
>>> https://github.com/malexer/pytest-spark
>>> 
>>>
>>> Thanks,
>>> Umar
>>>
>>>
>>>
>>> --
>>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>>>
>>> -
>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>
>>>
>> --
> Twitter: https://twitter.com/holdenkarau
>


Re: testing frameworks

2018-05-30 Thread Holden Karau
So Jessie has an excellent blog post on how to use it with Java
applications -
http://www.jesse-anderson.com/2016/04/unit-testing-spark-with-java/

On Wed, May 30, 2018 at 4:14 AM Spico Florin  wrote:

> Hello!
>   I'm also looking for unit testing spark Java application. I've seen the
> great work done in  spark-testing-base but it seemed to me that I could
> not use for Spark Java applications.
> Only spark scala applications are supported?
> Thanks.
> Regards,
>  Florin
>
> On Wed, May 23, 2018 at 8:07 AM, umargeek 
> wrote:
>
>> Hi Steve,
>>
>> you can try out pytest-spark plugin if your writing programs using pyspark
>> ,please find below link for reference.
>>
>> https://github.com/malexer/pytest-spark
>> 
>>
>> Thanks,
>> Umar
>>
>>
>>
>> --
>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
> --
Twitter: https://twitter.com/holdenkarau


Re: testing frameworks

2018-05-30 Thread Spico Florin
Hello!
  I'm also looking for unit testing spark Java application. I've seen the
great work done in  spark-testing-base but it seemed to me that I could not
use for Spark Java applications.
Only spark scala applications are supported?
Thanks.
Regards,
 Florin

On Wed, May 23, 2018 at 8:07 AM, umargeek 
wrote:

> Hi Steve,
>
> you can try out pytest-spark plugin if your writing programs using pyspark
> ,please find below link for reference.
>
> https://github.com/malexer/pytest-spark
> 
>
> Thanks,
> Umar
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: testing frameworks

2018-05-22 Thread umargeek
Hi Steve,

you can try out pytest-spark plugin if your writing programs using pyspark
,please find below link for reference.

https://github.com/malexer/pytest-spark
  

Thanks,
Umar



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: [EXTERNAL] - Re: testing frameworks

2018-05-22 Thread Joel D
We’ve developed our own version of testing framework consisting of
different areas of checking, sometimes providing expected data and
comparing with the resultant data from the data object.

Cheers.

On Tue, May 22, 2018 at 1:48 PM Steve Pruitt <bpru...@opentext.com> wrote:

> Something more on the lines of integration I believe.  Run one or more
> Spark jobs and verify the output results.  If this makes sense.
>
>
>
> I am very new to the world of Spark.  We want to include pipeline testing
> from the get go.  I will check out spark-testing-base.
>
>
>
>
>
> Thanks.
>
>
>
> *From:* Holden Karau [mailto:hol...@pigscanfly.ca]
> *Sent:* Monday, May 21, 2018 11:32 AM
> *To:* Steve Pruitt <bpru...@opentext.com>
> *Cc:* user@spark.apache.org
> *Subject:* [EXTERNAL] - Re: testing frameworks
>
>
>
> So I’m biased as the author of spark-testing-base but I think it’s pretty
> ok. Are you looking for unit or integration or something else?
>
>
>
> On Mon, May 21, 2018 at 5:24 AM Steve Pruitt <bpru...@opentext.com> wrote:
>
> Hi,
>
>
>
> Can anyone recommend testing frameworks suitable for Spark jobs.
> Something that can be integrated into a CI tool would be great.
>
>
>
> Thanks.
>
>
>
> --
>
> Twitter: https://twitter.com/holdenkarau
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__twitter.com_holdenkarau=DwMFaQ=ZgVRmm3mf2P1-XDAyDsu4A=ksx9qnQFG3QvxkP54EBPEzv1HHDjlk-MFO-7EONGCtY=YTdxEm6qmXE1TQvlRzPccMkNLcynfxhC32Uj91HcaXA=a_ORg1aB6eKT2ZYxtSJw3oOQnHmi07gjf9whuROeNYw=>
>


RE: [EXTERNAL] - Re: testing frameworks

2018-05-22 Thread Steve Pruitt
Something more on the lines of integration I believe.  Run one or more Spark 
jobs and verify the output results.  If this makes sense.

I am very new to the world of Spark.  We want to include pipeline testing from 
the get go.  I will check out spark-testing-base.


Thanks.

From: Holden Karau [mailto:hol...@pigscanfly.ca]
Sent: Monday, May 21, 2018 11:32 AM
To: Steve Pruitt <bpru...@opentext.com>
Cc: user@spark.apache.org
Subject: [EXTERNAL] - Re: testing frameworks

So I’m biased as the author of spark-testing-base but I think it’s pretty ok. 
Are you looking for unit or integration or something else?

On Mon, May 21, 2018 at 5:24 AM Steve Pruitt 
<bpru...@opentext.com<mailto:bpru...@opentext.com>> wrote:
Hi,

Can anyone recommend testing frameworks suitable for Spark jobs.  Something 
that can be integrated into a CI tool would be great.

Thanks.

--
Twitter: 
https://twitter.com/holdenkarau<https://urldefense.proofpoint.com/v2/url?u=https-3A__twitter.com_holdenkarau=DwMFaQ=ZgVRmm3mf2P1-XDAyDsu4A=ksx9qnQFG3QvxkP54EBPEzv1HHDjlk-MFO-7EONGCtY=YTdxEm6qmXE1TQvlRzPccMkNLcynfxhC32Uj91HcaXA=a_ORg1aB6eKT2ZYxtSJw3oOQnHmi07gjf9whuROeNYw=>


Re: testing frameworks

2018-05-21 Thread Holden Karau
So I’m biased as the author of spark-testing-base but I think it’s pretty
ok. Are you looking for unit or integration or something else?

On Mon, May 21, 2018 at 5:24 AM Steve Pruitt  wrote:

> Hi,
>
>
>
> Can anyone recommend testing frameworks suitable for Spark jobs.
> Something that can be integrated into a CI tool would be great.
>
>
>
> Thanks.
>
>
>
-- 
Twitter: https://twitter.com/holdenkarau