Poor HDFS Data Locality on Spark-EC2
Hi Spark users and developers, I have been trying to use spark-ec2. After I launched the spark cluster (1.4.1) with ephemeral hdfs (using hadoop 2.4.0), I tried to execute a job where the data is stored in the ephemeral hdfs. It does not matter what I tried to do, there is no data locality at all. For instance, filtering data and calculating the count of the filter data will always have locality level any. I tweaked the configurations spark.locality.wait.* but it does not seem to care. I'm guessing this is because the hostname cannot be resolved properly. Does anyone experience this problem before? Best Regards, Jerry
data locality in spark
Hi guys, I am running some SQL queries, but all my tasks are reported as either NODE_LOCAL or PROCESS_LOCAL. In case of Hadoop world, the reduce tasks are RACK or NON_RACK LOCAL because they have to aggregate data from multiple hosts. However, in Spark even the aggregation stages are reported as NODE/PROCESS LOCAL. Do I miss something, or why the reduce-like tasks are still NODE/PROCESS LOCAL ? Thanks,Robert
Re: Data locality running Spark on Mesos
I tried two Spark stand-alone configurations: SPARK_WORKER_CORES=1 SPARK_WORKER_MEMORY=1g SPARK_WORKER_INSTANCES=6 spark.driver.memory 1g spark.executor.memory 1g spark.storage.memoryFraction 0.9 --total-executor-cores 60 In the second configuration (same as first, but): SPARK_WORKER_CORES=6 SPARK_WORKER_MEMORY=6g SPARK_WORKER_INSTANCES=1 spark.executor.memory 6g Runs using the first configuration have faster execution times compared with Spark runs on my configuration of Mesos (both coarse-grained and fine-grained), Runs using second configuration had about the same execution time as with Mesos. Looking at the logs again, it looks like the locality info between the stand-alone and Mesos coarse-grained mode are very similar. I must have been hallucinating earlier thinking somehow the data locality information was different. So this whole thing might just simply be due to the fact that it is not possible in Mesos right now to set the number of executors. Even in fine-grained mode, there seems to be just one executor per node (I had thought differently in my previous message). The workloads I've tried apparently performs better with many executors per node than a single powerful executor per node. Would be really useful once this feature you've mentioned: https://issues.apache.org/jira/browse/SPARK-5095 is implemented. Spark on Mesos fine-grained configuration: driver memory = 1G spark.executor.memory 6g (tried also with 1g, still one executor per node and execution time roughly the same) spark.storage.memoryFraction 0.9 Mike From: Timothy Chen t...@mesosphere.io To: Michael V Le/Watson/IBM@IBMUS Cc: user user@spark.apache.org Date: 01/10/2015 04:31 AM Subject:Re: Data locality running Spark on Mesos Hi Michael, I see you capped the cores to 60. I wonder what's the settings you used for standalone mode that you compared with? I can try to run a MLib workload on both to compare. Tim On Jan 9, 2015, at 6:42 AM, Michael V Le m...@us.ibm.com wrote: Hi Tim, Thanks for your response. The benchmark I used just reads data in from HDFS and builds the Linear Regression model using methods from the MLlib. Unfortunately, for various reasons, I can't open the source code for the benchmark at this time. I will try to replicate the problem using some sample benchmarks provided by the vanilla Spark distribution. It is very possible that I have something very screwy in my workload or setup. The parameters I used for the Spark on Mesos are the following: driver memory = 1G total-executor-cores = 60 spark.executor.memory 6g spark.storage.memoryFraction 0.9 spark.mesos.coarse = true The rest are default values, so spark.locality.wait should just be 3000ms. I launched the Spark job on a separate node from the 10-node cluster using spark-submit. With regards to Mesos in fine-grained mode, do you have a feel for the overhead of launching executors for every task? Of course, any perceived slow down will probably be very dependent on the workload. I just want to have a feel of the possible overhead (e.g., factor of 2 or 3 slowdown?). If not a data locality issue, perhaps this overhead can be a factor in the slowdown I observed, at least in the fine-grained case. BTW: i'm using Spark ver 1.1.0 and Mesos ver 0.20.0 Thanks, Mike graycol.gifTim Chen ---01/08/2015 03:04:51 PM---How did you run this benchmark, and is there a open version I can try it with? From: Tim Chen t...@mesosphere.io To: Michael V Le/Watson/IBM@IBMUS Cc: user user@spark.apache.org Date: 01/08/2015 03:04 PM Subject: Re: Data locality running Spark on Mesos How did you run this benchmark, and is there a open version I can try it with? And what is your configurations, like spark.locality.wait, etc? Tim On Thu, Jan 8, 2015 at 11:44 AM, mvle m...@us.ibm.com wrote: Hi, I've noticed running Spark apps on Mesos is significantly slower compared to stand-alone or Spark on YARN. I don't think it should be the case, so I am posting the problem here in case someone has some explanation or can point me to some configuration options i've missed. I'm running the LinearRegression benchmark with a dataset of 48.8GB. On a 10-node stand-alone Spark cluster (each node 4-core, 8GB of RAM), I can finish the workload in about 5min (I don't remember exactly). The data is loaded into HDFS spanning the same 10-node cluster. There are 6 worker instances per node. However, when running the same workload on the same cluster but now with Spark on Mesos
Re: Data locality running Spark on Mesos
Hi Michael, I see you capped the cores to 60. I wonder what's the settings you used for standalone mode that you compared with? I can try to run a MLib workload on both to compare. Tim On Jan 9, 2015, at 6:42 AM, Michael V Le m...@us.ibm.com wrote: Hi Tim, Thanks for your response. The benchmark I used just reads data in from HDFS and builds the Linear Regression model using methods from the MLlib. Unfortunately, for various reasons, I can't open the source code for the benchmark at this time. I will try to replicate the problem using some sample benchmarks provided by the vanilla Spark distribution. It is very possible that I have something very screwy in my workload or setup. The parameters I used for the Spark on Mesos are the following: driver memory = 1G total-executor-cores = 60 spark.executor.memory 6g spark.storage.memoryFraction 0.9 spark.mesos.coarse = true The rest are default values, so spark.locality.wait should just be 3000ms. I launched the Spark job on a separate node from the 10-node cluster using spark-submit. With regards to Mesos in fine-grained mode, do you have a feel for the overhead of launching executors for every task? Of course, any perceived slow down will probably be very dependent on the workload. I just want to have a feel of the possible overhead (e.g., factor of 2 or 3 slowdown?). If not a data locality issue, perhaps this overhead can be a factor in the slowdown I observed, at least in the fine-grained case. BTW: i'm using Spark ver 1.1.0 and Mesos ver 0.20.0 Thanks, Mike graycol.gifTim Chen ---01/08/2015 03:04:51 PM---How did you run this benchmark, and is there a open version I can try it with? From: Tim Chen t...@mesosphere.io To: Michael V Le/Watson/IBM@IBMUS Cc: user user@spark.apache.org Date: 01/08/2015 03:04 PM Subject: Re: Data locality running Spark on Mesos How did you run this benchmark, and is there a open version I can try it with? And what is your configurations, like spark.locality.wait, etc? Tim On Thu, Jan 8, 2015 at 11:44 AM, mvle m...@us.ibm.com wrote: Hi, I've noticed running Spark apps on Mesos is significantly slower compared to stand-alone or Spark on YARN. I don't think it should be the case, so I am posting the problem here in case someone has some explanation or can point me to some configuration options i've missed. I'm running the LinearRegression benchmark with a dataset of 48.8GB. On a 10-node stand-alone Spark cluster (each node 4-core, 8GB of RAM), I can finish the workload in about 5min (I don't remember exactly). The data is loaded into HDFS spanning the same 10-node cluster. There are 6 worker instances per node. However, when running the same workload on the same cluster but now with Spark on Mesos (course-grained mode), the execution time is somewhere around 15min. Actually, I tried with find-grained mode and giving each Mesos node 6 VCPUs (to hopefully get 6 executors like the stand-alone test), I still get roughly 15min. I've noticed that when Spark is running on Mesos, almost all tasks execute with locality NODE_LOCAL (even in Mesos in coarse-grained mode). On stand-alone, the locality is mostly PROCESS_LOCAL. I think this locality issue might be the reason for the slow down but I can't figure out why, especially for coarse-grained mode as the executors supposedly do not go away until job completion. Any ideas? Thanks, Mike -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Data-locality-running-Spark-on-Mesos-tp21041.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Data locality running Spark on Mesos
Hi Tim, Thanks for your response. The benchmark I used just reads data in from HDFS and builds the Linear Regression model using methods from the MLlib. Unfortunately, for various reasons, I can't open the source code for the benchmark at this time. I will try to replicate the problem using some sample benchmarks provided by the vanilla Spark distribution. It is very possible that I have something very screwy in my workload or setup. The parameters I used for the Spark on Mesos are the following: driver memory = 1G total-executor-cores = 60 spark.executor.memory 6g spark.storage.memoryFraction 0.9 spark.mesos.coarse = true The rest are default values, so spark.locality.wait should just be 3000ms. I launched the Spark job on a separate node from the 10-node cluster using spark-submit. With regards to Mesos in fine-grained mode, do you have a feel for the overhead of launching executors for every task? Of course, any perceived slow down will probably be very dependent on the workload. I just want to have a feel of the possible overhead (e.g., factor of 2 or 3 slowdown?). If not a data locality issue, perhaps this overhead can be a factor in the slowdown I observed, at least in the fine-grained case. BTW: i'm using Spark ver 1.1.0 and Mesos ver 0.20.0 Thanks, Mike From: Tim Chen t...@mesosphere.io To: Michael V Le/Watson/IBM@IBMUS Cc: user user@spark.apache.org Date: 01/08/2015 03:04 PM Subject:Re: Data locality running Spark on Mesos How did you run this benchmark, and is there a open version I can try it with? And what is your configurations, like spark.locality.wait, etc? Tim On Thu, Jan 8, 2015 at 11:44 AM, mvle m...@us.ibm.com wrote: Hi, I've noticed running Spark apps on Mesos is significantly slower compared to stand-alone or Spark on YARN. I don't think it should be the case, so I am posting the problem here in case someone has some explanation or can point me to some configuration options i've missed. I'm running the LinearRegression benchmark with a dataset of 48.8GB. On a 10-node stand-alone Spark cluster (each node 4-core, 8GB of RAM), I can finish the workload in about 5min (I don't remember exactly). The data is loaded into HDFS spanning the same 10-node cluster. There are 6 worker instances per node. However, when running the same workload on the same cluster but now with Spark on Mesos (course-grained mode), the execution time is somewhere around 15min. Actually, I tried with find-grained mode and giving each Mesos node 6 VCPUs (to hopefully get 6 executors like the stand-alone test), I still get roughly 15min. I've noticed that when Spark is running on Mesos, almost all tasks execute with locality NODE_LOCAL (even in Mesos in coarse-grained mode). On stand-alone, the locality is mostly PROCESS_LOCAL. I think this locality issue might be the reason for the slow down but I can't figure out why, especially for coarse-grained mode as the executors supposedly do not go away until job completion. Any ideas? Thanks, Mike -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Data-locality-running-Spark-on-Mesos-tp21041.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Data locality running Spark on Mesos
Hi, I've noticed running Spark apps on Mesos is significantly slower compared to stand-alone or Spark on YARN. I don't think it should be the case, so I am posting the problem here in case someone has some explanation or can point me to some configuration options i've missed. I'm running the LinearRegression benchmark with a dataset of 48.8GB. On a 10-node stand-alone Spark cluster (each node 4-core, 8GB of RAM), I can finish the workload in about 5min (I don't remember exactly). The data is loaded into HDFS spanning the same 10-node cluster. There are 6 worker instances per node. However, when running the same workload on the same cluster but now with Spark on Mesos (course-grained mode), the execution time is somewhere around 15min. Actually, I tried with find-grained mode and giving each Mesos node 6 VCPUs (to hopefully get 6 executors like the stand-alone test), I still get roughly 15min. I've noticed that when Spark is running on Mesos, almost all tasks execute with locality NODE_LOCAL (even in Mesos in coarse-grained mode). On stand-alone, the locality is mostly PROCESS_LOCAL. I think this locality issue might be the reason for the slow down but I can't figure out why, especially for coarse-grained mode as the executors supposedly do not go away until job completion. Any ideas? Thanks, Mike -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Data-locality-running-Spark-on-Mesos-tp21041.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Data locality running Spark on Mesos
How did you run this benchmark, and is there a open version I can try it with? And what is your configurations, like spark.locality.wait, etc? Tim On Thu, Jan 8, 2015 at 11:44 AM, mvle m...@us.ibm.com wrote: Hi, I've noticed running Spark apps on Mesos is significantly slower compared to stand-alone or Spark on YARN. I don't think it should be the case, so I am posting the problem here in case someone has some explanation or can point me to some configuration options i've missed. I'm running the LinearRegression benchmark with a dataset of 48.8GB. On a 10-node stand-alone Spark cluster (each node 4-core, 8GB of RAM), I can finish the workload in about 5min (I don't remember exactly). The data is loaded into HDFS spanning the same 10-node cluster. There are 6 worker instances per node. However, when running the same workload on the same cluster but now with Spark on Mesos (course-grained mode), the execution time is somewhere around 15min. Actually, I tried with find-grained mode and giving each Mesos node 6 VCPUs (to hopefully get 6 executors like the stand-alone test), I still get roughly 15min. I've noticed that when Spark is running on Mesos, almost all tasks execute with locality NODE_LOCAL (even in Mesos in coarse-grained mode). On stand-alone, the locality is mostly PROCESS_LOCAL. I think this locality issue might be the reason for the slow down but I can't figure out why, especially for coarse-grained mode as the executors supposedly do not go away until job completion. Any ideas? Thanks, Mike -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Data-locality-running-Spark-on-Mesos-tp21041.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Performance of Akka or TCP Socket input sources vs HDFS: Data locality in Spark Streaming
Hello! Spark Streaming supports HDFS as input source, and also Akka actor receivers, or TCP socket receivers. For my use case I think it's probably more convenient to read the data directly from Actors, because I already need to set up a multi-node Akka cluster (on the same nodes that Spark runs on) and write some actors to perform some parallel operations. Writing actor receivers to consume the results of my business-logic actors and then feed into Spark is pretty seamless. Note that the actors generate a large amount of data (a few GBs to tens of GBs). The other option would be to setup HDFS on the same cluster as Spark, write the data from the Actors to HDFS, and then use HDFS as input source for Spark Streaming. Does this result in better performance due to data locality (with HDFS data replication turned on)? I think performance should be almost the same with actors, since Spark workers local to the worker actors should get the data fast, and some optimization like this is definitely done I assume? I suppose the only benefit with HDFS would be better fault tolerance, and the ability to checkpoint and recover even if master fails. Cheers, Nilesh -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Performance-of-Akka-or-TCP-Socket-input-sources-vs-HDFS-Data-locality-in-Spark-Streaming-tp7317.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Performance of Akka or TCP Socket input sources vs HDFS: Data locality in Spark Streaming
Hey Nilesh, Great to hear your using Spark Streaming, in my opinion the crux of your question comes down to what you want to do with the data in the future and/or if there is utility it using it from more than one Spark/Streaming job. 1). *One-time-use fire and forget *- as you rightly point out, hooking up to the Akka actors makes sense if the usefulness of the data is short-lived and you don't need the ability to readily go back into archived data. 2). *Fault tolerance multiple uses* - consider using a message queue like Apache Kafka [1], write messages from your Akka Actors into a Kafka topic with multiple partitions and replication. Then use Spark Streaming job(s) to read from Kafka. You can tune Kafka to keep the last *N* days data online so if your Spark Streaming job dies it can pickup at the point it left off. 3). *Keep indefinitely* - files in HDFS, 'nuff said. We're currently using (2) Kafka (3) HDFS to process around 400M web clickstream events a week. Everything is written into Kafka and kept 'online' for 7 days, and also written out to HDFS in compressed date-sequential files. We use several Spark Streaming jobs to process the real-time events straight from Kafka. Kafka supports multiple consumers so each job sees his own view of the message queue and all its events. If any of the Streaming jobs die or are restarted they continue consuming from Kafka from the last processed message without effecting any of the other consumer processes. Best, MC [1] http://kafka.apache.org/ On 10 June 2014 13:05, Nilesh Chakraborty nil...@nileshc.com wrote: Hello! Spark Streaming supports HDFS as input source, and also Akka actor receivers, or TCP socket receivers. For my use case I think it's probably more convenient to read the data directly from Actors, because I already need to set up a multi-node Akka cluster (on the same nodes that Spark runs on) and write some actors to perform some parallel operations. Writing actor receivers to consume the results of my business-logic actors and then feed into Spark is pretty seamless. Note that the actors generate a large amount of data (a few GBs to tens of GBs). The other option would be to setup HDFS on the same cluster as Spark, write the data from the Actors to HDFS, and then use HDFS as input source for Spark Streaming. Does this result in better performance due to data locality (with HDFS data replication turned on)? I think performance should be almost the same with actors, since Spark workers local to the worker actors should get the data fast, and some optimization like this is definitely done I assume? I suppose the only benefit with HDFS would be better fault tolerance, and the ability to checkpoint and recover even if master fails. Cheers, Nilesh -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Performance-of-Akka-or-TCP-Socket-input-sources-vs-HDFS-Data-locality-in-Spark-Streaming-tp7317.html Sent from the Apache Spark User List mailing list archive at Nabble.com.