Zef Wolffs created LIVY-767:
-------------------------------
Summary: Spark-Cassandra-Connector's joinWithCassandraTable makes
spark use only two executors
Key: LIVY-767
URL: https://issues.apache.org/jira/browse/LIVY-767
Project: Livy
Issue Type: Bug
Components: Interpreter
Affects Versions: 0.7.0
Environment: Spark and Livy running in containers on Kubernetes.
Deployed using Microsofts Helm chart:
https://hub.helm.sh/charts/microsoft/spark.
Reporter: Zef Wolffs
I have a script that uses joinWithCassandraTable, a method available through
the spark-cassandra-connector. This script usually claims all available
executors (about 25 in my setup) for this method when ran from anywhere
directly through spark-submit. However, when I run it through Livy, spark only
claims two executors which makes the job much slower.
The script is the following:
{noformat}
case class TimeSeriesContainerID(timeseriescontainerid: Int) // Defines
partition key
val timeSeriesContainerIDofInterest = sc.parallelize(1 to
20).map(TimeSeriesContainerID(_))
val rdd = timeSeriesContainerIDofInterest.joinWithCassandraTable[(Timestamp,
Int)]("test_ts_partitions", "ts_0")
val aggregatedRDD = rdd.select("timestamp", "tsvalue").map(s =>
s._2).reduceByKey((a, b) => a + b).collect.head{noformat}
I am happy to provide more information as requested.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)