[ 
https://issues.apache.org/jira/browse/SPARK-28749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated SPARK-28749:
-------------------------------
    Description: 
As noted in SPARK-27550 we want to encourage testing of Spark 2.4.x with 
Scala-2.12, and kafka-0-8 does not support Scala-2.12.

Currently, the PySpark tests invoked by `python/run-tests` demand the presence 
of kafka-0-8 libraries. If not present, this failure message will be generated:
 {code}
Traceback (most recent call last):
 File "/usr/lib64/python2.7/runpy.py", line 174, in _run_module_as_main
 "__main__", fname, loader, pkg_name)
 File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
 exec code in run_globals
 File "spark/python/pyspark/streaming/tests.py", line 1579, in <module>
 kafka_assembly_jar = search_kafka_assembly_jar()
 File "spark/python/pyspark/streaming/tests.py", line 1524, in 
search_kafka_assembly_jar
 "You need to build Spark with "
 Exception: Failed to find Spark Streaming kafka assembly jar in 
spark/external/kafka-0-8-assembly. You need to build Spark with 'build/sbt 
-Pkafka-0-8 assembly/package streaming-kafka-0-8-assembly/assembly' or 
'build/mvn -DskipTests -Pkafka-0-8 package' before running this test.

Had test failures in pyspark.streaming.tests with spark/py_virtenv/bin/python; 
see logs.
 Process exited with code 255
{code}

This change is only targeted at branch-2.4, as most kafka-0-8 related materials 
have been removed in master and this problem no longer occurs there.

PROPOSED SOLUTION

The proposed solution is to make the kafka-0-8 stream testing optional for 
pyspark testing, exactly the same as the Kinesis stream testing currently is, 
in file `python/pyspark/streaming/tests.py`. This is only a few lines of change.

Ideally it would be limited to when SPARK_SCALA_VERSION >= 2.12, but it turns 
out to be somewhat onerous to reliably obtain that value from within the python 
test env, and no other python test code currently does so. So my proposed 
solution simply makes the use of the kafka-0-8 profile optional, and leaves it 
to the tester to include it for Scala-2.11 test builds and exclude it for 
Scala-2.12 test builds.

PR will be available in a day or so.

  was:
As noted in SPARK-27550 we want to encourage testing of Spark 2.4.x with 
Scala-2.12, and kafka-0-8 does not support Scala-2.12.

Currently, the PySpark tests invoked by `python/run-tests` demand the presence 
of kafka-0-8 libraries. If not present, this failure message will be generated:
 {{Traceback (most recent call last):
 File "/usr/lib64/python2.7/runpy.py", line 174, in _run_module_as_main
 "__main__", fname, loader, pkg_name)
 File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
 exec code in run_globals
 File "spark/python/pyspark/streaming/tests.py", line 1579, in <module>
 kafka_assembly_jar = search_kafka_assembly_jar()
 File "spark/python/pyspark/streaming/tests.py", line 1524, in 
search_kafka_assembly_jar
 "You need to build Spark with "
 Exception: Failed to find Spark Streaming kafka assembly jar in 
spark/external/kafka-0-8-assembly. You need to build Spark with 'build/sbt 
-Pkafka-0-8 assembly/package streaming-kafka-0-8-assembly/assembly' or 
'build/mvn -DskipTests -Pkafka-0-8 package' before running this test.

Had test failures in pyspark.streaming.tests with spark/py_virtenv/bin/python; 
see logs.
 Process exited with code 255}}

This change is only targeted at branch-2.4, as most kafka-0-8 related materials 
have been removed in master and this problem no longer occurs there.

PROPOSED SOLUTION

The proposed solution is to make the kafka-0-8 stream testing optional for 
pyspark testing, exactly the same as the Kinesis stream testing currently is, 
in file `python/pyspark/streaming/tests.py`. This is only a few lines of change.

Ideally it would be limited to when SPARK_SCALA_VERSION >= 2.12, but it turns 
out to be somewhat onerous to reliably obtain that value from within the python 
test env, and no other python test code currently does so. So my proposed 
solution simply makes the use of the kafka-0-8 profile optional, and leaves it 
to the tester to include it for Scala-2.11 test builds and exclude it for 
Scala-2.12 test builds.

PR will be available in a day or so.


> Fix PySpark tests not to require kafka-0-8 in branch-2.4
> --------------------------------------------------------
>
>                 Key: SPARK-28749
>                 URL: https://issues.apache.org/jira/browse/SPARK-28749
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, Tests
>    Affects Versions: 2.4.3
>            Reporter: Matt Foley
>            Priority: Minor
>
> As noted in SPARK-27550 we want to encourage testing of Spark 2.4.x with 
> Scala-2.12, and kafka-0-8 does not support Scala-2.12.
> Currently, the PySpark tests invoked by `python/run-tests` demand the 
> presence of kafka-0-8 libraries. If not present, this failure message will be 
> generated:
>  {code}
> Traceback (most recent call last):
>  File "/usr/lib64/python2.7/runpy.py", line 174, in _run_module_as_main
>  "__main__", fname, loader, pkg_name)
>  File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
>  exec code in run_globals
>  File "spark/python/pyspark/streaming/tests.py", line 1579, in <module>
>  kafka_assembly_jar = search_kafka_assembly_jar()
>  File "spark/python/pyspark/streaming/tests.py", line 1524, in 
> search_kafka_assembly_jar
>  "You need to build Spark with "
>  Exception: Failed to find Spark Streaming kafka assembly jar in 
> spark/external/kafka-0-8-assembly. You need to build Spark with 'build/sbt 
> -Pkafka-0-8 assembly/package streaming-kafka-0-8-assembly/assembly' or 
> 'build/mvn -DskipTests -Pkafka-0-8 package' before running this test.
> Had test failures in pyspark.streaming.tests with 
> spark/py_virtenv/bin/python; see logs.
>  Process exited with code 255
> {code}
> This change is only targeted at branch-2.4, as most kafka-0-8 related 
> materials have been removed in master and this problem no longer occurs there.
> PROPOSED SOLUTION
> The proposed solution is to make the kafka-0-8 stream testing optional for 
> pyspark testing, exactly the same as the Kinesis stream testing currently is, 
> in file `python/pyspark/streaming/tests.py`. This is only a few lines of 
> change.
> Ideally it would be limited to when SPARK_SCALA_VERSION >= 2.12, but it turns 
> out to be somewhat onerous to reliably obtain that value from within the 
> python test env, and no other python test code currently does so. So my 
> proposed solution simply makes the use of the kafka-0-8 profile optional, and 
> leaves it to the tester to include it for Scala-2.11 test builds and exclude 
> it for Scala-2.12 test builds.
> PR will be available in a day or so.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to