Hyukjin Kwon updated SPARK-28749:
    Target Version/s:   (was: 2.4.4)

> Fix PySpark tests not to require kafka-0-8 in branch-2.4
> --------------------------------------------------------
>                 Key: SPARK-28749
>                 URL: https://issues.apache.org/jira/browse/SPARK-28749
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, Tests
>    Affects Versions: 2.4.3
>            Reporter: Matt Foley
>            Priority: Minor
> As noted in SPARK-27550 we want to encourage testing of Spark 2.4.x with 
> Scala-2.12, and kafka-0-8 does not support Scala-2.12.
> Currently, the PySpark tests invoked by `python/run-tests` demand the 
> presence of kafka-0-8 libraries. If not present, this failure message will be 
> generated:
>  {code}
> Traceback (most recent call last):
>  File "/usr/lib64/python2.7/runpy.py", line 174, in _run_module_as_main
>  "__main__", fname, loader, pkg_name)
>  File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
>  exec code in run_globals
>  File "spark/python/pyspark/streaming/tests.py", line 1579, in <module>
>  kafka_assembly_jar = search_kafka_assembly_jar()
>  File "spark/python/pyspark/streaming/tests.py", line 1524, in 
> search_kafka_assembly_jar
>  "You need to build Spark with "
>  Exception: Failed to find Spark Streaming kafka assembly jar in 
> spark/external/kafka-0-8-assembly. You need to build Spark with 'build/sbt 
> -Pkafka-0-8 assembly/package streaming-kafka-0-8-assembly/assembly' or 
> 'build/mvn -DskipTests -Pkafka-0-8 package' before running this test.
> Had test failures in pyspark.streaming.tests with 
> spark/py_virtenv/bin/python; see logs.
>  Process exited with code 255
> {code}
> This change is only targeted at branch-2.4, as most kafka-0-8 related 
> materials have been removed in master and this problem no longer occurs there.
> The proposed solution is to make the kafka-0-8 stream testing optional for 
> pyspark testing, exactly the same as the Kinesis stream testing currently is, 
> in file `python/pyspark/streaming/tests.py`. This is only a few lines of 
> change.
> Ideally it would be limited to when SPARK_SCALA_VERSION >= 2.12, but it turns 
> out to be somewhat onerous to reliably obtain that value from within the 
> python test env, and no other python test code currently does so. So my 
> proposed solution simply makes the use of the kafka-0-8 profile optional, and 
> leaves it to the tester to include it for Scala-2.11 test builds and exclude 
> it for Scala-2.12 test builds.
> PR will be available in a day or so.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to